In [ ]:
######################################### House Rent Problem #################################################################

Problem Statement

Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

Goal

It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable.

Metric

Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)

Reference Link : https://www.kaggle.com/c/house-prices-advanced-regression-techniques/overview/description

File descriptions

train.csv - the training set test.csv - the test set data_description.txt - full description of each column, originally prepared by Dean De Cock but lightly edited to match the column names used here sample_submission.csv - a benchmark submission from a linear regression on year and month of sale, lot square footage, and number of bedrooms

Data fields

Here's a brief version of what you'll find in the data description file.

SalePrice - the property's sale price in dollars. This is the target variable that you're trying to predict.

MSSubClass: The building class

MSZoning: The general zoning classification

LotFrontage: Linear feet of street connected to property

LotArea: Lot size in square feet

Street: Type of road access

Alley: Type of alley access

LotShape: General shape of property

LandContour: Flatness of the property

Utilities: Type of utilities available

LotConfig: Lot configuration

LandSlope: Slope of property

Neighborhood: Physical locations within Ames city limits

Condition1: Proximity to main road or railroad

Condition2: Proximity to main road or railroad (if a second is present)

BldgType: Type of dwelling

HouseStyle: Style of dwelling

OverallQual: Overall material and finish quality

OverallCond: Overall condition rating

YearBuilt: Original construction date

YearRemodAdd: Remodel date

RoofStyle: Type of roof

RoofMatl: Roof material

Exterior1st: Exterior covering on house

Exterior2nd: Exterior covering on house (if more than one material)

MasVnrType: Masonry veneer type

MasVnrArea: Masonry veneer area in square feet

ExterQual: Exterior material quality

ExterCond: Present condition of the material on the exterior

Foundation: Type of foundation

BsmtQual: Height of the basement

BsmtCond: General condition of the basement

BsmtExposure: Walkout or garden level basement walls

BsmtFinType1: Quality of basement finished area

BsmtFinSF1: Type 1 finished square feet

BsmtFinType2: Quality of second finished area (if present)

BsmtFinSF2: Type 2 finished square feet

BsmtUnfSF: Unfinished square feet of basement area

TotalBsmtSF: Total square feet of basement area

Heating: Type of heating

HeatingQC: Heating quality and condition

CentralAir: Central air conditioning

Electrical: Electrical system

1stFlrSF: First Floor square feet

2ndFlrSF: Second floor square feet

LowQualFinSF: Low quality finished square feet (all floors)

GrLivArea: Above grade (ground) living area square feet

BsmtFullBath: Basement full bathrooms

BsmtHalfBath: Basement half bathrooms

FullBath: Full bathrooms above grade

HalfBath: Half baths above grade

Bedroom: Number of bedrooms above basement level

Kitchen: Number of kitchens

KitchenQual: Kitchen quality

TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)

Functional: Home functionality rating

Fireplaces: Number of fireplaces

FireplaceQu: Fireplace quality

GarageType: Garage location

GarageYrBlt: Year garage was built

GarageFinish: Interior finish of the garage

GarageCars: Size of garage in car capacity

GarageArea: Size of garage in square feet

GarageQual: Garage quality

GarageCond: Garage condition

PavedDrive: Paved driveway

WoodDeckSF: Wood deck area in square feet

OpenPorchSF: Open porch area in square feet

EnclosedPorch: Enclosed porch area in square feet

3SsnPorch: Three season porch area in square feet

ScreenPorch: Screen porch area in square feet

PoolArea: Pool area in square feet

PoolQC: Pool quality

Fence: Fence quality

MiscFeature: Miscellaneous feature not covered in other categories

MiscVal: $Value of miscellaneous feature

MoSold: Month Sold

YrSold: Year Sold

SaleType: Type of sale

SaleCondition: Condition of sale

In [1]:
## Import necessary libraries.

import numpy as np ## Numpy Library ( will use to convert data frame to array or creating array etc...).
import pandas as pd ## Pandas Library (will use to load data,create data frame...etc).
import os ## For connecting to machine to get path for reading/writing files.
from sklearn.model_selection import train_test_split ## For splitting data into train and validation.
import matplotlib.pyplot as plt ## For visualization.
import seaborn as sns ## For visualization.
In [3]:
## Get current working directory.
os.getcwd()
Out[3]:
'D:\\Python\\Pratice'
In [4]:
## Set working directory
os.chdir("D:\\DataScience\\Pratice\\House_Rent_Price")
In [5]:
## Read the train data set.
data = pd.read_csv("train.csv",header='infer',sep=',')
In [6]:
## Read the test data set.
test_data = pd.read_csv("test.csv",header='infer',sep=',')
In [704]:
## Set how many rows and columns you want to display in jupyter notebook.
pd.options.display.max_columns = 200 
pd.get_option('display.max_rows') 
pd.set_option('display.max_rows',None) 
In [705]:
## Check dimesnions of train data.
data.shape
Out[705]:
(1460, 81)
In [706]:
## Check dimensions of test data.
test_data.shape
Out[706]:
(1459, 80)
In [707]:
## Get first 5 records of train data.
data.head()
Out[707]:
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
0 1 60 RL 65.0 8450 Pave NaN Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196.0 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 1 0 2 1 3 1 Gd 8 Typ 0 NaN Attchd 2003.0 RFn 2 548 TA TA Y 0 61 0 0 0 0 NaN NaN NaN 0 2 2008 WD Normal 208500
1 2 20 RL 80.0 9600 Pave NaN Reg Lvl AllPub FR2 Gtl Veenker Feedr Norm 1Fam 1Story 6 8 1976 1976 Gable CompShg MetalSd MetalSd None 0.0 TA TA CBlock Gd TA Gd ALQ 978 Unf 0 284 1262 GasA Ex Y SBrkr 1262 0 0 1262 0 1 2 0 3 1 TA 6 Typ 1 TA Attchd 1976.0 RFn 2 460 TA TA Y 298 0 0 0 0 0 NaN NaN NaN 0 5 2007 WD Normal 181500
2 3 60 RL 68.0 11250 Pave NaN IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2001 2002 Gable CompShg VinylSd VinylSd BrkFace 162.0 Gd TA PConc Gd TA Mn GLQ 486 Unf 0 434 920 GasA Ex Y SBrkr 920 866 0 1786 1 0 2 1 3 1 Gd 6 Typ 1 TA Attchd 2001.0 RFn 2 608 TA TA Y 0 42 0 0 0 0 NaN NaN NaN 0 9 2008 WD Normal 223500
3 4 70 RL 60.0 9550 Pave NaN IR1 Lvl AllPub Corner Gtl Crawfor Norm Norm 1Fam 2Story 7 5 1915 1970 Gable CompShg Wd Sdng Wd Shng None 0.0 TA TA BrkTil TA Gd No ALQ 216 Unf 0 540 756 GasA Gd Y SBrkr 961 756 0 1717 1 0 1 0 3 1 Gd 7 Typ 1 Gd Detchd 1998.0 Unf 3 642 TA TA Y 0 35 272 0 0 0 NaN NaN NaN 0 2 2006 WD Abnorml 140000
4 5 60 RL 84.0 14260 Pave NaN IR1 Lvl AllPub FR2 Gtl NoRidge Norm Norm 1Fam 2Story 8 5 2000 2000 Gable CompShg VinylSd VinylSd BrkFace 350.0 Gd TA PConc Gd TA Av GLQ 655 Unf 0 490 1145 GasA Ex Y SBrkr 1145 1053 0 2198 1 0 2 1 4 1 Gd 9 Typ 1 TA Attchd 2000.0 RFn 3 836 TA TA Y 192 84 0 0 0 0 NaN NaN NaN 0 12 2008 WD Normal 250000
In [708]:
## Get first 5 records of train data.
test_data.head()
Out[708]:
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition
0 1461 20 RH 80.0 11622 Pave NaN Reg Lvl AllPub Inside Gtl NAmes Feedr Norm 1Fam 1Story 5 6 1961 1961 Gable CompShg VinylSd VinylSd None 0.0 TA TA CBlock TA TA No Rec 468.0 LwQ 144.0 270.0 882.0 GasA TA Y SBrkr 896 0 0 896 0.0 0.0 1 0 2 1 TA 5 Typ 0 NaN Attchd 1961.0 Unf 1.0 730.0 TA TA Y 140 0 0 0 120 0 NaN MnPrv NaN 0 6 2010 WD Normal
1 1462 20 RL 81.0 14267 Pave NaN IR1 Lvl AllPub Corner Gtl NAmes Norm Norm 1Fam 1Story 6 6 1958 1958 Hip CompShg Wd Sdng Wd Sdng BrkFace 108.0 TA TA CBlock TA TA No ALQ 923.0 Unf 0.0 406.0 1329.0 GasA TA Y SBrkr 1329 0 0 1329 0.0 0.0 1 1 3 1 Gd 6 Typ 0 NaN Attchd 1958.0 Unf 1.0 312.0 TA TA Y 393 36 0 0 0 0 NaN NaN Gar2 12500 6 2010 WD Normal
2 1463 60 RL 74.0 13830 Pave NaN IR1 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story 5 5 1997 1998 Gable CompShg VinylSd VinylSd None 0.0 TA TA PConc Gd TA No GLQ 791.0 Unf 0.0 137.0 928.0 GasA Gd Y SBrkr 928 701 0 1629 0.0 0.0 2 1 3 1 TA 6 Typ 1 TA Attchd 1997.0 Fin 2.0 482.0 TA TA Y 212 34 0 0 0 0 NaN MnPrv NaN 0 3 2010 WD Normal
3 1464 60 RL 78.0 9978 Pave NaN IR1 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story 6 6 1998 1998 Gable CompShg VinylSd VinylSd BrkFace 20.0 TA TA PConc TA TA No GLQ 602.0 Unf 0.0 324.0 926.0 GasA Ex Y SBrkr 926 678 0 1604 0.0 0.0 2 1 3 1 Gd 7 Typ 1 Gd Attchd 1998.0 Fin 2.0 470.0 TA TA Y 360 36 0 0 0 0 NaN NaN NaN 0 6 2010 WD Normal
4 1465 120 RL 43.0 5005 Pave NaN IR1 HLS AllPub Inside Gtl StoneBr Norm Norm TwnhsE 1Story 8 5 1992 1992 Gable CompShg HdBoard HdBoard None 0.0 Gd TA PConc Gd TA No ALQ 263.0 Unf 0.0 1017.0 1280.0 GasA Ex Y SBrkr 1280 0 0 1280 0.0 0.0 2 0 2 1 Gd 5 Typ 0 NaN Attchd 1992.0 RFn 2.0 506.0 TA TA Y 0 82 0 0 144 0 NaN NaN NaN 0 1 2010 WD Normal
In [709]:
## Check summary statistics of train data.
data.describe(include='all')
Out[709]:
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
count 1460.000000 1460.000000 1460 1201.000000 1460.000000 1460 91 1460 1460 1460 1460 1460 1460 1460 1460 1460 1460 1460.000000 1460.000000 1460.000000 1460.000000 1460 1460 1460 1460 1452 1452.000000 1460 1460 1460 1423 1423 1422 1423 1460.000000 1422 1460.000000 1460.000000 1460.000000 1460 1460 1460 1459 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460 1460.000000 1460 1460.000000 770 1379 1379.000000 1379 1460.000000 1460.000000 1379 1379 1460 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000 7 281 54 1460.000000 1460.000000 1460.000000 1460 1460 1460.000000
unique NaN NaN 5 NaN NaN 2 2 4 4 2 5 3 25 9 8 5 8 NaN NaN NaN NaN 6 8 15 16 4 NaN 4 5 6 4 4 4 6 NaN 6 NaN NaN NaN 6 5 2 5 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4 NaN 7 NaN 5 6 NaN 3 NaN NaN 5 5 3 NaN NaN NaN NaN NaN NaN 3 4 4 NaN NaN NaN 9 6 NaN
top NaN NaN RL NaN NaN Pave Grvl Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story NaN NaN NaN NaN Gable CompShg VinylSd VinylSd None NaN TA TA PConc TA TA No Unf NaN Unf NaN NaN NaN GasA Ex Y SBrkr NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN TA NaN Typ NaN Gd Attchd NaN Unf NaN NaN TA TA Y NaN NaN NaN NaN NaN NaN Gd MnPrv Shed NaN NaN NaN WD Normal NaN
freq NaN NaN 1151 NaN NaN 1454 50 925 1311 1459 1052 1382 225 1260 1445 1220 726 NaN NaN NaN NaN 1141 1434 515 504 864 NaN 906 1282 647 649 1311 953 430 NaN 1256 NaN NaN NaN 1428 741 1365 1334 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 735 NaN 1360 NaN 380 870 NaN 605 NaN NaN 1311 1326 1340 NaN NaN NaN NaN NaN NaN 3 157 49 NaN NaN NaN 1267 1198 NaN
mean 730.500000 56.897260 NaN 70.049958 10516.828082 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 6.099315 5.575342 1971.267808 1984.865753 NaN NaN NaN NaN NaN 103.685262 NaN NaN NaN NaN NaN NaN NaN 443.639726 NaN 46.549315 567.240411 1057.429452 NaN NaN NaN NaN 1162.626712 346.992466 5.844521 1515.463699 0.425342 0.057534 1.565068 0.382877 2.866438 1.046575 NaN 6.517808 NaN 0.613014 NaN NaN 1978.506164 NaN 1.767123 472.980137 NaN NaN NaN 94.244521 46.660274 21.954110 3.409589 15.060959 2.758904 NaN NaN NaN 43.489041 6.321918 2007.815753 NaN NaN 180921.195890
std 421.610009 42.300571 NaN 24.284752 9981.264932 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.382997 1.112799 30.202904 20.645407 NaN NaN NaN NaN NaN 181.066207 NaN NaN NaN NaN NaN NaN NaN 456.098091 NaN 161.319273 441.866955 438.705324 NaN NaN NaN NaN 386.587738 436.528436 48.623081 525.480383 0.518911 0.238753 0.550916 0.502885 0.815778 0.220338 NaN 1.625393 NaN 0.644666 NaN NaN 24.689725 NaN 0.747315 213.804841 NaN NaN NaN 125.338794 66.256028 61.119149 29.317331 55.757415 40.177307 NaN NaN NaN 496.123024 2.703626 1.328095 NaN NaN 79442.502883
min 1.000000 20.000000 NaN 21.000000 1300.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.000000 1.000000 1872.000000 1950.000000 NaN NaN NaN NaN NaN 0.000000 NaN NaN NaN NaN NaN NaN NaN 0.000000 NaN 0.000000 0.000000 0.000000 NaN NaN NaN NaN 334.000000 0.000000 0.000000 334.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN 2.000000 NaN 0.000000 NaN NaN 1900.000000 NaN 0.000000 0.000000 NaN NaN NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN NaN 0.000000 1.000000 2006.000000 NaN NaN 34900.000000
25% 365.750000 20.000000 NaN 59.000000 7553.500000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 5.000000 5.000000 1954.000000 1967.000000 NaN NaN NaN NaN NaN 0.000000 NaN NaN NaN NaN NaN NaN NaN 0.000000 NaN 0.000000 223.000000 795.750000 NaN NaN NaN NaN 882.000000 0.000000 0.000000 1129.500000 0.000000 0.000000 1.000000 0.000000 2.000000 1.000000 NaN 5.000000 NaN 0.000000 NaN NaN 1961.000000 NaN 1.000000 334.500000 NaN NaN NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN NaN 0.000000 5.000000 2007.000000 NaN NaN 129975.000000
50% 730.500000 50.000000 NaN 69.000000 9478.500000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 6.000000 5.000000 1973.000000 1994.000000 NaN NaN NaN NaN NaN 0.000000 NaN NaN NaN NaN NaN NaN NaN 383.500000 NaN 0.000000 477.500000 991.500000 NaN NaN NaN NaN 1087.000000 0.000000 0.000000 1464.000000 0.000000 0.000000 2.000000 0.000000 3.000000 1.000000 NaN 6.000000 NaN 1.000000 NaN NaN 1980.000000 NaN 2.000000 480.000000 NaN NaN NaN 0.000000 25.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN NaN 0.000000 6.000000 2008.000000 NaN NaN 163000.000000
75% 1095.250000 70.000000 NaN 80.000000 11601.500000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 7.000000 6.000000 2000.000000 2004.000000 NaN NaN NaN NaN NaN 166.000000 NaN NaN NaN NaN NaN NaN NaN 712.250000 NaN 0.000000 808.000000 1298.250000 NaN NaN NaN NaN 1391.250000 728.000000 0.000000 1776.750000 1.000000 0.000000 2.000000 1.000000 3.000000 1.000000 NaN 7.000000 NaN 1.000000 NaN NaN 2002.000000 NaN 2.000000 576.000000 NaN NaN NaN 168.000000 68.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN NaN 0.000000 8.000000 2009.000000 NaN NaN 214000.000000
max 1460.000000 190.000000 NaN 313.000000 215245.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 10.000000 9.000000 2010.000000 2010.000000 NaN NaN NaN NaN NaN 1600.000000 NaN NaN NaN NaN NaN NaN NaN 5644.000000 NaN 1474.000000 2336.000000 6110.000000 NaN NaN NaN NaN 4692.000000 2065.000000 572.000000 5642.000000 3.000000 2.000000 3.000000 2.000000 8.000000 3.000000 NaN 14.000000 NaN 3.000000 NaN NaN 2010.000000 NaN 4.000000 1418.000000 NaN NaN NaN 857.000000 547.000000 552.000000 508.000000 480.000000 738.000000 NaN NaN NaN 15500.000000 12.000000 2010.000000 NaN NaN 755000.000000
In [710]:
## Check summary statistics of test data.
test_data.describe(include='all')
Out[710]:
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition
count 1459.000000 1459.000000 1455 1232.000000 1459.000000 1459 107 1459 1459 1457 1459 1459 1459 1459 1459 1459 1459 1459.000000 1459.000000 1459.000000 1459.000000 1459 1459 1458 1458 1443 1444.000000 1459 1459 1459 1415 1414 1415 1417 1458.000000 1417 1458.000000 1458.000000 1458.000000 1459 1459 1459 1459 1459.000000 1459.000000 1459.000000 1459.000000 1457.000000 1457.000000 1459.000000 1459.000000 1459.000000 1459.000000 1458 1459.000000 1457 1459.00000 729 1383 1381.000000 1381 1458.000000 1458.000000 1381 1381 1459 1459.000000 1459.000000 1459.000000 1459.000000 1459.000000 1459.000000 3 290 51 1459.000000 1459.000000 1459.000000 1458 1459
unique NaN NaN 5 NaN NaN 2 2 4 4 1 5 3 25 9 5 5 7 NaN NaN NaN NaN 6 4 13 15 4 NaN 4 5 6 4 4 4 6 NaN 6 NaN NaN NaN 4 5 2 4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4 NaN 7 NaN 5 6 NaN 3 NaN NaN 4 5 3 NaN NaN NaN NaN NaN NaN 2 4 3 NaN NaN NaN 9 6
top NaN NaN RL NaN NaN Pave Grvl Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story NaN NaN NaN NaN Gable CompShg VinylSd VinylSd None NaN TA TA PConc TA TA No GLQ NaN Unf NaN NaN NaN GasA Ex Y SBrkr NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN TA NaN Typ NaN Gd Attchd NaN Unf NaN NaN TA TA Y NaN NaN NaN NaN NaN NaN Ex MnPrv Shed NaN NaN NaN WD Normal
freq NaN NaN 1114 NaN NaN 1453 70 934 1311 1457 1081 1396 218 1251 1444 1205 745 NaN NaN NaN NaN 1169 1442 510 510 878 NaN 892 1256 661 634 1295 951 431 NaN 1237 NaN NaN NaN 1446 752 1358 1337 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 757 NaN 1357 NaN 364 853 NaN 625 NaN NaN 1293 1328 1301 NaN NaN NaN NaN NaN NaN 2 172 46 NaN NaN NaN 1258 1204
mean 2190.000000 57.378341 NaN 68.580357 9819.161069 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 6.078821 5.553804 1971.357779 1983.662783 NaN NaN NaN NaN NaN 100.709141 NaN NaN NaN NaN NaN NaN NaN 439.203704 NaN 52.619342 554.294925 1046.117970 NaN NaN NaN NaN 1156.534613 325.967786 3.543523 1486.045922 0.434454 0.065202 1.570939 0.377656 2.854010 1.042495 NaN 6.385195 NaN 0.58122 NaN NaN 1977.721217 NaN 1.766118 472.768861 NaN NaN NaN 93.174777 48.313914 24.243317 1.794380 17.064428 1.744345 NaN NaN NaN 58.167923 6.104181 2007.769705 NaN NaN
std 421.321334 42.746880 NaN 22.376841 4955.517327 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.436812 1.113740 30.390071 21.130467 NaN NaN NaN NaN NaN 177.625900 NaN NaN NaN NaN NaN NaN NaN 455.268042 NaN 176.753926 437.260486 442.898624 NaN NaN NaN NaN 398.165820 420.610226 44.043251 485.566099 0.530648 0.252468 0.555190 0.503017 0.829788 0.208472 NaN 1.508895 NaN 0.64742 NaN NaN 26.431175 NaN 0.775945 217.048611 NaN NaN NaN 127.744882 68.883364 67.227765 20.207842 56.609763 30.491646 NaN NaN NaN 630.806978 2.722432 1.301740 NaN NaN
min 1461.000000 20.000000 NaN 21.000000 1470.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.000000 1.000000 1879.000000 1950.000000 NaN NaN NaN NaN NaN 0.000000 NaN NaN NaN NaN NaN NaN NaN 0.000000 NaN 0.000000 0.000000 0.000000 NaN NaN NaN NaN 407.000000 0.000000 0.000000 407.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN 3.000000 NaN 0.00000 NaN NaN 1895.000000 NaN 0.000000 0.000000 NaN NaN NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN NaN 0.000000 1.000000 2006.000000 NaN NaN
25% 1825.500000 20.000000 NaN 58.000000 7391.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 5.000000 5.000000 1953.000000 1963.000000 NaN NaN NaN NaN NaN 0.000000 NaN NaN NaN NaN NaN NaN NaN 0.000000 NaN 0.000000 219.250000 784.000000 NaN NaN NaN NaN 873.500000 0.000000 0.000000 1117.500000 0.000000 0.000000 1.000000 0.000000 2.000000 1.000000 NaN 5.000000 NaN 0.00000 NaN NaN 1959.000000 NaN 1.000000 318.000000 NaN NaN NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN NaN 0.000000 4.000000 2007.000000 NaN NaN
50% 2190.000000 50.000000 NaN 67.000000 9399.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 6.000000 5.000000 1973.000000 1992.000000 NaN NaN NaN NaN NaN 0.000000 NaN NaN NaN NaN NaN NaN NaN 350.500000 NaN 0.000000 460.000000 988.000000 NaN NaN NaN NaN 1079.000000 0.000000 0.000000 1432.000000 0.000000 0.000000 2.000000 0.000000 3.000000 1.000000 NaN 6.000000 NaN 0.00000 NaN NaN 1979.000000 NaN 2.000000 480.000000 NaN NaN NaN 0.000000 28.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN NaN 0.000000 6.000000 2008.000000 NaN NaN
75% 2554.500000 70.000000 NaN 80.000000 11517.500000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 7.000000 6.000000 2001.000000 2004.000000 NaN NaN NaN NaN NaN 164.000000 NaN NaN NaN NaN NaN NaN NaN 753.500000 NaN 0.000000 797.750000 1305.000000 NaN NaN NaN NaN 1382.500000 676.000000 0.000000 1721.000000 1.000000 0.000000 2.000000 1.000000 3.000000 1.000000 NaN 7.000000 NaN 1.00000 NaN NaN 2002.000000 NaN 2.000000 576.000000 NaN NaN NaN 168.000000 72.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN NaN 0.000000 8.000000 2009.000000 NaN NaN
max 2919.000000 190.000000 NaN 200.000000 56600.000000 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 10.000000 9.000000 2010.000000 2010.000000 NaN NaN NaN NaN NaN 1290.000000 NaN NaN NaN NaN NaN NaN NaN 4010.000000 NaN 1526.000000 2140.000000 5095.000000 NaN NaN NaN NaN 5095.000000 1862.000000 1064.000000 5095.000000 3.000000 2.000000 4.000000 2.000000 6.000000 2.000000 NaN 15.000000 NaN 4.00000 NaN NaN 2207.000000 NaN 5.000000 1488.000000 NaN NaN NaN 1424.000000 742.000000 1012.000000 360.000000 576.000000 800.000000 NaN NaN NaN 17000.000000 12.000000 2010.000000 NaN NaN
In [712]:
## Get columns data types of train data.
data.dtypes
Out[712]:
Id                 int64
MSSubClass         int64
MSZoning          object
LotFrontage      float64
LotArea            int64
Street            object
Alley             object
LotShape          object
LandContour       object
Utilities         object
LotConfig         object
LandSlope         object
Neighborhood      object
Condition1        object
Condition2        object
BldgType          object
HouseStyle        object
OverallQual        int64
OverallCond        int64
YearBuilt          int64
YearRemodAdd       int64
RoofStyle         object
RoofMatl          object
Exterior1st       object
Exterior2nd       object
MasVnrType        object
MasVnrArea       float64
ExterQual         object
ExterCond         object
Foundation        object
BsmtQual          object
BsmtCond          object
BsmtExposure      object
BsmtFinType1      object
BsmtFinSF1         int64
BsmtFinType2      object
BsmtFinSF2         int64
BsmtUnfSF          int64
TotalBsmtSF        int64
Heating           object
HeatingQC         object
CentralAir        object
Electrical        object
1stFlrSF           int64
2ndFlrSF           int64
LowQualFinSF       int64
GrLivArea          int64
BsmtFullBath       int64
BsmtHalfBath       int64
FullBath           int64
HalfBath           int64
BedroomAbvGr       int64
KitchenAbvGr       int64
KitchenQual       object
TotRmsAbvGrd       int64
Functional        object
Fireplaces         int64
FireplaceQu       object
GarageType        object
GarageYrBlt      float64
GarageFinish      object
GarageCars         int64
GarageArea         int64
GarageQual        object
GarageCond        object
PavedDrive        object
WoodDeckSF         int64
OpenPorchSF        int64
EnclosedPorch      int64
3SsnPorch          int64
ScreenPorch        int64
PoolArea           int64
PoolQC            object
Fence             object
MiscFeature       object
MiscVal            int64
MoSold             int64
YrSold             int64
SaleType          object
SaleCondition     object
SalePrice          int64
dtype: object
In [713]:
## Get columns data types of test data.
test_data.dtypes
Out[713]:
Id                 int64
MSSubClass         int64
MSZoning          object
LotFrontage      float64
LotArea            int64
Street            object
Alley             object
LotShape          object
LandContour       object
Utilities         object
LotConfig         object
LandSlope         object
Neighborhood      object
Condition1        object
Condition2        object
BldgType          object
HouseStyle        object
OverallQual        int64
OverallCond        int64
YearBuilt          int64
YearRemodAdd       int64
RoofStyle         object
RoofMatl          object
Exterior1st       object
Exterior2nd       object
MasVnrType        object
MasVnrArea       float64
ExterQual         object
ExterCond         object
Foundation        object
BsmtQual          object
BsmtCond          object
BsmtExposure      object
BsmtFinType1      object
BsmtFinSF1       float64
BsmtFinType2      object
BsmtFinSF2       float64
BsmtUnfSF        float64
TotalBsmtSF      float64
Heating           object
HeatingQC         object
CentralAir        object
Electrical        object
1stFlrSF           int64
2ndFlrSF           int64
LowQualFinSF       int64
GrLivArea          int64
BsmtFullBath     float64
BsmtHalfBath     float64
FullBath           int64
HalfBath           int64
BedroomAbvGr       int64
KitchenAbvGr       int64
KitchenQual       object
TotRmsAbvGrd       int64
Functional        object
Fireplaces         int64
FireplaceQu       object
GarageType        object
GarageYrBlt      float64
GarageFinish      object
GarageCars       float64
GarageArea       float64
GarageQual        object
GarageCond        object
PavedDrive        object
WoodDeckSF         int64
OpenPorchSF        int64
EnclosedPorch      int64
3SsnPorch          int64
ScreenPorch        int64
PoolArea           int64
PoolQC            object
Fence             object
MiscFeature       object
MiscVal            int64
MoSold             int64
YrSold             int64
SaleType          object
SaleCondition     object
dtype: object
In [ ]:
## EDA
In [715]:
## Plot scatter matrix for train data.
pd.plotting.scatter_matrix(data, figsize=(75, 75), diagonal='kde')
plt.show()
In [716]:
## Plot scatter matrix for test data.
pd.plotting.scatter_matrix(test_data, figsize=(75, 75), diagonal='kde')
plt.show()
In [19]:
## Plot correlation matrix for train data.
plt.figure(figsize=(75,75))
sns.heatmap(data.corr(),cmap='coolwarm',annot = True)
plt.show()
In [20]:
## Plot correlation matrix for test data.
plt.figure(figsize=(75,75))
sns.heatmap(test_data.corr(),cmap='coolwarm',annot = True)
plt.show()
In [960]:
## Plot Probability plot for target varible.
from scipy import stats
#Get also the QQ-plot
fig = plt.figure()
res = stats.probplot(data['SalePrice'], plot=plt)
plt.show() 
In [962]:
## Display distribution plot for target,GarageCars columns.
plt.figure(figsize=(16,8))
sns.boxplot(x='GarageCars',y='SalePrice',data=data)
plt.show()
In [964]:
## Display scatter plot for GarageArea,target columns.
sns.lmplot(x='GarageArea',y='SalePrice',data=data)
C:\Users\nagar\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
Out[964]:
<seaborn.axisgrid.FacetGrid at 0x209072cecf8>
In [969]:
## Saleprice correlation matrix
k = 10 ## Number of variables for heatmap
plt.figure(figsize=(16,8))
corrmat = data.corr()
## Picking the top 15 correlated features
cols = corrmat.nlargest(k, 'SalePrice')['SalePrice'].index
cm = np.corrcoef(data[cols].values.T)
sns.set(font_scale=1.25)
hm = sns.heatmap(cm, cbar=True, annot=True, square=True, fmt='.2f', annot_kws={'size': 10}, yticklabels=cols.values, xticklabels=cols.values)
plt.show()
In [972]:
## Plot Histogram for target column.
sns.distplot(data['SalePrice']);
C:\Users\nagar\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
In [974]:
## Check skewness and kurtosis for target varible.
print("Skewness: %f" % data['SalePrice'].skew())
print("Kurtosis: %f" % data['SalePrice'].kurt())
Skewness: 1.882876
Kurtosis: 6.536282
In [980]:
## Scatter plot for some columns of train data.
sns.set()
cols = ['SalePrice', 'OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'YearBuilt']
sns.pairplot(data[cols], size = 2.5)
plt.show();
In [982]:
## Histogram and normal probability plot.
from scipy.stats import norm
sns.distplot(data['SalePrice'], fit=norm);
fig = plt.figure()
res = stats.probplot(data['SalePrice'], plot=plt)
C:\Users\nagar\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
In [987]:
## Display correlation plot in desecnding order.
corr = data.select_dtypes(include=['int64','float64']).corr()
plt.figure(figsize=(16,6))
corr['SalePrice'].sort_values(ascending=False)[1:].plot(kind='bar')
plt.tight_layout()
In [989]:
## Visualize missing data.
missing_value = data.isnull().sum().sort_values(ascending=False) / len(data) * 100
missing_value = missing_value[missing_value != 0]
missing_value = pd.DataFrame({'Missing value' :missing_value,'Type':missing_value.index.map(lambda x:data[x].dtype)})
missing_value.plot(kind='bar',figsize=(16,4))
plt.show()
In [9]:
## Display distribution plots.
quantitative = data.select_dtypes('int64')
f = pd.melt(data, value_vars=quantitative)
g = sns.FacetGrid(f, col="variable",  col_wrap=2, sharex=False, sharey=False)
g = g.map(sns.distplot, "value")
C:\Users\nagar\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  color=hist_color, **hist_kws)
In [16]:
## Display distribution plots.
qualitative = data.select_dtypes(['object','category'])
temp = data
for c in qualitative:
    temp[c] = temp[c].astype('category')
    if temp[c].isnull().any():
        temp[c] = temp[c].cat.add_categories(['MISSING'])
        temp[c] = temp[c].fillna('MISSING')

def boxplot(x, y, **kwargs):
    sns.boxplot(x=x, y=y)
    x=plt.xticks(rotation=90)
f = pd.melt(temp, id_vars=['SalePrice'], value_vars=qualitative)
g = sns.FacetGrid(f, col="variable",  col_wrap=2, sharex=False, sharey=False, size=5)
g = g.map(boxplot, "value", "SalePrice")
In [21]:
## Plot histograms for train data.
(data.select_dtypes(include = ['float64', 'int64'])).hist(figsize=(16, 20), bins=50, xlabelsize=8, ylabelsize=8);
In [22]:
## Pair Plot.
temp_1 =data.select_dtypes(include = ['float64', 'int64'])
for i in range(0, len(temp_1.columns), 5):
    sns.pairplot(data=temp_1,
                x_vars=temp_1.columns[i:i+5],
                y_vars=['SalePrice'])
In [37]:
## Magic command.
%matplotlib inline
In [719]:
## Get missing values for train data.
data.isna().sum()
Out[719]:
Id                  0
MSSubClass          0
MSZoning            0
LotFrontage       259
LotArea             0
Street              0
Alley            1369
LotShape            0
LandContour         0
Utilities           0
LotConfig           0
LandSlope           0
Neighborhood        0
Condition1          0
Condition2          0
BldgType            0
HouseStyle          0
OverallQual         0
OverallCond         0
YearBuilt           0
YearRemodAdd        0
RoofStyle           0
RoofMatl            0
Exterior1st         0
Exterior2nd         0
MasVnrType          8
MasVnrArea          8
ExterQual           0
ExterCond           0
Foundation          0
BsmtQual           37
BsmtCond           37
BsmtExposure       38
BsmtFinType1       37
BsmtFinSF1          0
BsmtFinType2       38
BsmtFinSF2          0
BsmtUnfSF           0
TotalBsmtSF         0
Heating             0
HeatingQC           0
CentralAir          0
Electrical          1
1stFlrSF            0
2ndFlrSF            0
LowQualFinSF        0
GrLivArea           0
BsmtFullBath        0
BsmtHalfBath        0
FullBath            0
HalfBath            0
BedroomAbvGr        0
KitchenAbvGr        0
KitchenQual         0
TotRmsAbvGrd        0
Functional          0
Fireplaces          0
FireplaceQu       690
GarageType         81
GarageYrBlt        81
GarageFinish       81
GarageCars          0
GarageArea          0
GarageQual         81
GarageCond         81
PavedDrive          0
WoodDeckSF          0
OpenPorchSF         0
EnclosedPorch       0
3SsnPorch           0
ScreenPorch         0
PoolArea            0
PoolQC           1453
Fence            1179
MiscFeature      1406
MiscVal             0
MoSold              0
YrSold              0
SaleType            0
SaleCondition       0
SalePrice           0
dtype: int64
In [720]:
## Get missing values for test data.
test_data.isna().sum()
Out[720]:
Id                  0
MSSubClass          0
MSZoning            4
LotFrontage       227
LotArea             0
Street              0
Alley            1352
LotShape            0
LandContour         0
Utilities           2
LotConfig           0
LandSlope           0
Neighborhood        0
Condition1          0
Condition2          0
BldgType            0
HouseStyle          0
OverallQual         0
OverallCond         0
YearBuilt           0
YearRemodAdd        0
RoofStyle           0
RoofMatl            0
Exterior1st         1
Exterior2nd         1
MasVnrType         16
MasVnrArea         15
ExterQual           0
ExterCond           0
Foundation          0
BsmtQual           44
BsmtCond           45
BsmtExposure       44
BsmtFinType1       42
BsmtFinSF1          1
BsmtFinType2       42
BsmtFinSF2          1
BsmtUnfSF           1
TotalBsmtSF         1
Heating             0
HeatingQC           0
CentralAir          0
Electrical          0
1stFlrSF            0
2ndFlrSF            0
LowQualFinSF        0
GrLivArea           0
BsmtFullBath        2
BsmtHalfBath        2
FullBath            0
HalfBath            0
BedroomAbvGr        0
KitchenAbvGr        0
KitchenQual         1
TotRmsAbvGrd        0
Functional          2
Fireplaces          0
FireplaceQu       730
GarageType         76
GarageYrBlt        78
GarageFinish       78
GarageCars          1
GarageArea          1
GarageQual         78
GarageCond         78
PavedDrive          0
WoodDeckSF          0
OpenPorchSF         0
EnclosedPorch       0
3SsnPorch           0
ScreenPorch         0
PoolArea            0
PoolQC           1456
Fence            1169
MiscFeature      1408
MiscVal             0
MoSold              0
YrSold              0
SaleType            1
SaleCondition       0
dtype: int64
In [721]:
### Find missing values % for train data.
missing_value = (data.isna().sum()/len(data)).round(4)*100
missing_value.sort_values(ascending=False)
#missing_value.count
Out[721]:
PoolQC           99.52
MiscFeature      96.30
Alley            93.77
Fence            80.75
FireplaceQu      47.26
LotFrontage      17.74
GarageCond        5.55
GarageType        5.55
GarageYrBlt       5.55
GarageFinish      5.55
GarageQual        5.55
BsmtExposure      2.60
BsmtFinType2      2.60
BsmtFinType1      2.53
BsmtCond          2.53
BsmtQual          2.53
MasVnrArea        0.55
MasVnrType        0.55
Electrical        0.07
Utilities         0.00
YearRemodAdd      0.00
MSSubClass        0.00
Foundation        0.00
ExterCond         0.00
ExterQual         0.00
Exterior2nd       0.00
Exterior1st       0.00
RoofMatl          0.00
RoofStyle         0.00
YearBuilt         0.00
LotConfig         0.00
OverallCond       0.00
OverallQual       0.00
HouseStyle        0.00
BldgType          0.00
Condition2        0.00
BsmtFinSF1        0.00
MSZoning          0.00
LotArea           0.00
Street            0.00
Condition1        0.00
Neighborhood      0.00
LotShape          0.00
LandContour       0.00
LandSlope         0.00
SalePrice         0.00
HeatingQC         0.00
BsmtFinSF2        0.00
EnclosedPorch     0.00
Fireplaces        0.00
GarageCars        0.00
GarageArea        0.00
PavedDrive        0.00
WoodDeckSF        0.00
OpenPorchSF       0.00
3SsnPorch         0.00
BsmtUnfSF         0.00
ScreenPorch       0.00
PoolArea          0.00
MiscVal           0.00
MoSold            0.00
YrSold            0.00
SaleType          0.00
Functional        0.00
TotRmsAbvGrd      0.00
KitchenQual       0.00
KitchenAbvGr      0.00
BedroomAbvGr      0.00
HalfBath          0.00
FullBath          0.00
BsmtHalfBath      0.00
BsmtFullBath      0.00
GrLivArea         0.00
LowQualFinSF      0.00
2ndFlrSF          0.00
1stFlrSF          0.00
CentralAir        0.00
SaleCondition     0.00
Heating           0.00
TotalBsmtSF       0.00
Id                0.00
dtype: float64
In [722]:
### Find missing values % for test data.
missing_value_test = (test_data.isna().sum()/len(test_data)).round(4)*100
missing_value_test.sort_values(ascending=False)
Out[722]:
PoolQC           99.79
MiscFeature      96.50
Alley            92.67
Fence            80.12
FireplaceQu      50.03
LotFrontage      15.56
GarageCond        5.35
GarageQual        5.35
GarageYrBlt       5.35
GarageFinish      5.35
GarageType        5.21
BsmtCond          3.08
BsmtQual          3.02
BsmtExposure      3.02
BsmtFinType1      2.88
BsmtFinType2      2.88
MasVnrType        1.10
MasVnrArea        1.03
MSZoning          0.27
BsmtHalfBath      0.14
Utilities         0.14
Functional        0.14
BsmtFullBath      0.14
BsmtFinSF2        0.07
BsmtFinSF1        0.07
Exterior2nd       0.07
BsmtUnfSF         0.07
TotalBsmtSF       0.07
SaleType          0.07
Exterior1st       0.07
KitchenQual       0.07
GarageArea        0.07
GarageCars        0.07
HouseStyle        0.00
LandSlope         0.00
MSSubClass        0.00
LotArea           0.00
Street            0.00
LotShape          0.00
LandContour       0.00
LotConfig         0.00
Neighborhood      0.00
BldgType          0.00
Condition1        0.00
Condition2        0.00
RoofMatl          0.00
RoofStyle         0.00
YearRemodAdd      0.00
YearBuilt         0.00
OverallCond       0.00
OverallQual       0.00
SaleCondition     0.00
Heating           0.00
ExterQual         0.00
TotRmsAbvGrd      0.00
YrSold            0.00
MoSold            0.00
MiscVal           0.00
PoolArea          0.00
ScreenPorch       0.00
3SsnPorch         0.00
EnclosedPorch     0.00
OpenPorchSF       0.00
WoodDeckSF        0.00
PavedDrive        0.00
Fireplaces        0.00
KitchenAbvGr      0.00
ExterCond         0.00
BedroomAbvGr      0.00
HalfBath          0.00
FullBath          0.00
GrLivArea         0.00
LowQualFinSF      0.00
2ndFlrSF          0.00
1stFlrSF          0.00
Electrical        0.00
CentralAir        0.00
HeatingQC         0.00
Foundation        0.00
Id                0.00
dtype: float64
In [723]:
## Method will return number of levels,null values,unique values,data types

def Observations(df):
    return(pd.DataFrame({'dtypes' : df.dtypes,
                         'levels' : [df[x].unique() for x in df.columns],
                         'null_values' : df.isna().sum(),
                         'Unique Values': df.nunique()
                        }))

## Get columns data types,number of leveel,null values,unique value for each column of train data.
Observations(data)
Out[723]:
dtypes levels null_values Unique Values
Id int64 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... 0 1460
MSSubClass int64 [60, 20, 70, 50, 190, 45, 90, 120, 30, 85, 80,... 0 15
MSZoning object [RL, RM, C (all), FV, RH] 0 5
LotFrontage float64 [65.0, 80.0, 68.0, 60.0, 84.0, 85.0, 75.0, nan... 259 110
LotArea int64 [8450, 9600, 11250, 9550, 14260, 14115, 10084,... 0 1073
Street object [Pave, Grvl] 0 2
Alley object [nan, Grvl, Pave] 1369 2
LotShape object [Reg, IR1, IR2, IR3] 0 4
LandContour object [Lvl, Bnk, Low, HLS] 0 4
Utilities object [AllPub, NoSeWa] 0 2
LotConfig object [Inside, FR2, Corner, CulDSac, FR3] 0 5
LandSlope object [Gtl, Mod, Sev] 0 3
Neighborhood object [CollgCr, Veenker, Crawfor, NoRidge, Mitchel, ... 0 25
Condition1 object [Norm, Feedr, PosN, Artery, RRAe, RRNn, RRAn, ... 0 9
Condition2 object [Norm, Artery, RRNn, Feedr, PosN, PosA, RRAn, ... 0 8
BldgType object [1Fam, 2fmCon, Duplex, TwnhsE, Twnhs] 0 5
HouseStyle object [2Story, 1Story, 1.5Fin, 1.5Unf, SFoyer, SLvl,... 0 8
OverallQual int64 [7, 6, 8, 5, 9, 4, 10, 3, 1, 2] 0 10
OverallCond int64 [5, 8, 6, 7, 4, 2, 3, 9, 1] 0 9
YearBuilt int64 [2003, 1976, 2001, 1915, 2000, 1993, 2004, 197... 0 112
YearRemodAdd int64 [2003, 1976, 2002, 1970, 2000, 1995, 2005, 197... 0 61
RoofStyle object [Gable, Hip, Gambrel, Mansard, Flat, Shed] 0 6
RoofMatl object [CompShg, WdShngl, Metal, WdShake, Membran, Ta... 0 8
Exterior1st object [VinylSd, MetalSd, Wd Sdng, HdBoard, BrkFace, ... 0 15
Exterior2nd object [VinylSd, MetalSd, Wd Shng, HdBoard, Plywood, ... 0 16
MasVnrType object [BrkFace, None, Stone, BrkCmn, nan] 8 4
MasVnrArea float64 [196.0, 0.0, 162.0, 350.0, 186.0, 240.0, 286.0... 8 327
ExterQual object [Gd, TA, Ex, Fa] 0 4
ExterCond object [TA, Gd, Fa, Po, Ex] 0 5
Foundation object [PConc, CBlock, BrkTil, Wood, Slab, Stone] 0 6
BsmtQual object [Gd, TA, Ex, nan, Fa] 37 4
BsmtCond object [TA, Gd, nan, Fa, Po] 37 4
BsmtExposure object [No, Gd, Mn, Av, nan] 38 4
BsmtFinType1 object [GLQ, ALQ, Unf, Rec, BLQ, nan, LwQ] 37 6
BsmtFinSF1 int64 [706, 978, 486, 216, 655, 732, 1369, 859, 0, 8... 0 637
BsmtFinType2 object [Unf, BLQ, nan, ALQ, Rec, LwQ, GLQ] 38 6
BsmtFinSF2 int64 [0, 32, 668, 486, 93, 491, 506, 712, 362, 41, ... 0 144
BsmtUnfSF int64 [150, 284, 434, 540, 490, 64, 317, 216, 952, 1... 0 780
TotalBsmtSF int64 [856, 1262, 920, 756, 1145, 796, 1686, 1107, 9... 0 721
Heating object [GasA, GasW, Grav, Wall, OthW, Floor] 0 6
HeatingQC object [Ex, Gd, TA, Fa, Po] 0 5
CentralAir object [Y, N] 0 2
Electrical object [SBrkr, FuseF, FuseA, FuseP, Mix, nan] 1 5
1stFlrSF int64 [856, 1262, 920, 961, 1145, 796, 1694, 1107, 1... 0 753
2ndFlrSF int64 [854, 0, 866, 756, 1053, 566, 983, 752, 1142, ... 0 417
LowQualFinSF int64 [0, 360, 513, 234, 528, 572, 144, 392, 371, 39... 0 24
GrLivArea int64 [1710, 1262, 1786, 1717, 2198, 1362, 1694, 209... 0 861
BsmtFullBath int64 [1, 0, 2, 3] 0 4
BsmtHalfBath int64 [0, 1, 2] 0 3
FullBath int64 [2, 1, 3, 0] 0 4
HalfBath int64 [1, 0, 2] 0 3
BedroomAbvGr int64 [3, 4, 1, 2, 0, 5, 6, 8] 0 8
KitchenAbvGr int64 [1, 2, 3, 0] 0 4
KitchenQual object [Gd, TA, Ex, Fa] 0 4
TotRmsAbvGrd int64 [8, 6, 7, 9, 5, 11, 4, 10, 12, 3, 2, 14] 0 12
Functional object [Typ, Min1, Maj1, Min2, Mod, Maj2, Sev] 0 7
Fireplaces int64 [0, 1, 2, 3] 0 4
FireplaceQu object [nan, TA, Gd, Fa, Ex, Po] 690 5
GarageType object [Attchd, Detchd, BuiltIn, CarPort, nan, Basmen... 81 6
GarageYrBlt float64 [2003.0, 1976.0, 2001.0, 1998.0, 2000.0, 1993.... 81 97
GarageFinish object [RFn, Unf, Fin, nan] 81 3
GarageCars int64 [2, 3, 1, 0, 4] 0 5
GarageArea int64 [548, 460, 608, 642, 836, 480, 636, 484, 468, ... 0 441
GarageQual object [TA, Fa, Gd, nan, Ex, Po] 81 5
GarageCond object [TA, Fa, nan, Gd, Po, Ex] 81 5
PavedDrive object [Y, N, P] 0 3
WoodDeckSF int64 [0, 298, 192, 40, 255, 235, 90, 147, 140, 160,... 0 274
OpenPorchSF int64 [61, 0, 42, 35, 84, 30, 57, 204, 4, 21, 33, 21... 0 202
EnclosedPorch int64 [0, 272, 228, 205, 176, 87, 172, 102, 37, 144,... 0 120
3SsnPorch int64 [0, 320, 407, 130, 180, 168, 140, 508, 238, 24... 0 20
ScreenPorch int64 [0, 176, 198, 291, 252, 99, 184, 168, 130, 142... 0 76
PoolArea int64 [0, 512, 648, 576, 555, 480, 519, 738] 0 8
PoolQC object [nan, Ex, Fa, Gd] 1453 3
Fence object [nan, MnPrv, GdWo, GdPrv, MnWw] 1179 4
MiscFeature object [nan, Shed, Gar2, Othr, TenC] 1406 4
MiscVal int64 [0, 700, 350, 500, 400, 480, 450, 15500, 1200,... 0 21
MoSold int64 [2, 5, 9, 12, 10, 8, 11, 4, 1, 7, 3, 6] 0 12
YrSold int64 [2008, 2007, 2006, 2009, 2010] 0 5
SaleType object [WD, New, COD, ConLD, ConLI, CWD, ConLw, Con, ... 0 9
SaleCondition object [Normal, Abnorml, Partial, AdjLand, Alloca, Fa... 0 6
SalePrice int64 [208500, 181500, 223500, 140000, 250000, 14300... 0 663
In [724]:
## Get columns data types,number of leveel,null values,unique value for each column of test data.
Observations(test_data)
Out[724]:
dtypes levels null_values Unique Values
Id int64 [1461, 1462, 1463, 1464, 1465, 1466, 1467, 146... 0 1459
MSSubClass int64 [20, 60, 120, 160, 80, 30, 50, 90, 85, 190, 45... 0 16
MSZoning object [RH, RL, RM, FV, C (all), nan] 4 5
LotFrontage float64 [80.0, 81.0, 74.0, 78.0, 43.0, 75.0, nan, 63.0... 227 115
LotArea int64 [11622, 14267, 13830, 9978, 5005, 10000, 7980,... 0 1106
Street object [Pave, Grvl] 0 2
Alley object [nan, Pave, Grvl] 1352 2
LotShape object [Reg, IR1, IR2, IR3] 0 4
LandContour object [Lvl, HLS, Bnk, Low] 0 4
Utilities object [AllPub, nan] 2 1
LotConfig object [Inside, Corner, FR2, CulDSac, FR3] 0 5
LandSlope object [Gtl, Mod, Sev] 0 3
Neighborhood object [NAmes, Gilbert, StoneBr, BrDale, NPkVill, Nri... 0 25
Condition1 object [Feedr, Norm, PosN, RRNe, Artery, RRNn, PosA, ... 0 9
Condition2 object [Norm, Feedr, PosA, PosN, Artery] 0 5
BldgType object [1Fam, TwnhsE, Twnhs, Duplex, 2fmCon] 0 5
HouseStyle object [1Story, 2Story, SLvl, 1.5Fin, SFoyer, 2.5Unf,... 0 7
OverallQual int64 [5, 6, 8, 7, 4, 9, 2, 3, 10, 1] 0 10
OverallCond int64 [6, 5, 7, 8, 2, 9, 3, 4, 1] 0 9
YearBuilt int64 [1961, 1958, 1997, 1998, 1992, 1993, 1990, 197... 0 106
YearRemodAdd int64 [1961, 1958, 1998, 1992, 1994, 2007, 1990, 197... 0 61
RoofStyle object [Gable, Hip, Gambrel, Flat, Mansard, Shed] 0 6
RoofMatl object [CompShg, Tar&Grv, WdShake, WdShngl] 0 4
Exterior1st object [VinylSd, Wd Sdng, HdBoard, Plywood, MetalSd, ... 1 13
Exterior2nd object [VinylSd, Wd Sdng, HdBoard, Plywood, MetalSd, ... 1 15
MasVnrType object [None, BrkFace, Stone, BrkCmn, nan] 16 4
MasVnrArea float64 [0.0, 108.0, 20.0, 504.0, 492.0, 162.0, 256.0,... 15 303
ExterQual object [TA, Gd, Ex, Fa] 0 4
ExterCond object [TA, Gd, Fa, Po, Ex] 0 5
Foundation object [CBlock, PConc, BrkTil, Stone, Slab, Wood] 0 6
BsmtQual object [TA, Gd, Ex, Fa, nan] 44 4
BsmtCond object [TA, Po, Fa, Gd, nan] 45 4
BsmtExposure object [No, Gd, Mn, Av, nan] 44 4
BsmtFinType1 object [Rec, ALQ, GLQ, Unf, BLQ, LwQ, nan] 42 6
BsmtFinSF1 float64 [468.0, 923.0, 791.0, 602.0, 263.0, 0.0, 935.0... 1 669
BsmtFinType2 object [LwQ, Unf, Rec, BLQ, GLQ, ALQ, nan] 42 6
BsmtFinSF2 float64 [144.0, 0.0, 78.0, 859.0, 981.0, 42.0, 46.0, 1... 1 161
BsmtUnfSF float64 [270.0, 406.0, 137.0, 324.0, 1017.0, 763.0, 23... 1 793
TotalBsmtSF float64 [882.0, 1329.0, 928.0, 926.0, 1280.0, 763.0, 1... 1 736
Heating object [GasA, GasW, Grav, Wall] 0 4
HeatingQC object [TA, Gd, Ex, Fa, Po] 0 5
CentralAir object [Y, N] 0 2
Electrical object [SBrkr, FuseA, FuseF, FuseP] 0 4
1stFlrSF int64 [896, 1329, 928, 926, 1280, 763, 1187, 789, 13... 0 789
2ndFlrSF int64 [0, 701, 678, 892, 676, 504, 567, 601, 707, 56... 0 407
LowQualFinSF int64 [0, 362, 1064, 431, 436, 259, 312, 108, 697, 5... 0 15
GrLivArea int64 [896, 1329, 1629, 1604, 1280, 1655, 1187, 1465... 0 879
BsmtFullBath float64 [0.0, 1.0, 2.0, 3.0, nan] 2 4
BsmtHalfBath float64 [0.0, 1.0, nan, 2.0] 2 3
FullBath int64 [1, 2, 3, 4, 0] 0 5
HalfBath int64 [0, 1, 2] 0 3
BedroomAbvGr int64 [2, 3, 4, 1, 6, 5, 0] 0 7
KitchenAbvGr int64 [1, 2, 0] 0 3
KitchenQual object [TA, Gd, Ex, Fa, nan] 1 4
TotRmsAbvGrd int64 [5, 6, 7, 4, 10, 8, 9, 3, 12, 11, 13, 15] 0 12
Functional object [Typ, Min2, Min1, Mod, Maj1, Sev, Maj2, nan] 2 7
Fireplaces int64 [0, 1, 2, 3, 4] 0 5
FireplaceQu object [nan, TA, Gd, Po, Fa, Ex] 730 5
GarageType object [Attchd, Detchd, BuiltIn, nan, Basment, 2Types... 76 6
GarageYrBlt float64 [1961.0, 1958.0, 1997.0, 1998.0, 1992.0, 1993.... 78 97
GarageFinish object [Unf, Fin, RFn, nan] 78 3
GarageCars float64 [1.0, 2.0, 3.0, 0.0, 4.0, 5.0, nan] 1 6
GarageArea float64 [730.0, 312.0, 482.0, 470.0, 506.0, 440.0, 420... 1 459
GarageQual object [TA, nan, Fa, Gd, Po] 78 4
GarageCond object [TA, nan, Fa, Gd, Po, Ex] 78 5
PavedDrive object [Y, N, P] 0 3
WoodDeckSF int64 [140, 393, 212, 360, 0, 157, 483, 192, 240, 20... 0 263
OpenPorchSF int64 [0, 36, 34, 82, 84, 21, 75, 68, 30, 133, 35, 7... 0 203
EnclosedPorch int64 [0, 80, 186, 120, 150, 205, 113, 135, 126, 334... 0 131
3SsnPorch int64 [0, 224, 255, 225, 360, 150, 153, 174, 120, 21... 0 13
ScreenPorch int64 [120, 0, 144, 256, 216, 204, 160, 240, 148, 16... 0 75
PoolArea int64 [0, 144, 368, 444, 228, 561, 800] 0 7
PoolQC object [nan, Ex, Gd] 1456 2
Fence object [MnPrv, nan, GdPrv, GdWo, MnWw] 1169 4
MiscFeature object [nan, Gar2, Shed, Othr] 1408 3
MiscVal int64 [0, 12500, 500, 1500, 300, 450, 80, 600, 490, ... 0 26
MoSold int64 [6, 3, 1, 4, 5, 2, 7, 10, 8, 11, 9, 12] 0 12
YrSold int64 [2010, 2009, 2008, 2007, 2006] 0 5
SaleType object [WD, COD, New, ConLD, Oth, Con, ConLw, ConLI, ... 1 9
SaleCondition object [Normal, Partial, Abnorml, Family, Alloca, Adj... 0 6
In [725]:
## Check dimensionns of train data.
data.shape
Out[725]:
(1460, 81)
In [726]:
## Check dimesnions of test data.
test_data.shape
Out[726]:
(1459, 80)
In [727]:
### change nan calumn names to other
#data.Alley.replace(to_replace=dict(nan='NAC'), inplace=True)
In [728]:
#data.Alley.replace(['NaN'], ['NAC'], inplace=True)
In [729]:
#data.Alley[data.Alley == 'nan'] = 'NAC' 
In [730]:
#data.Alley.replace(to_replace ="nan", value ="NAC", inplace=True) 
In [731]:
## I am replacing NA to NAA for Alley column of train data,bcz NA is having different levels meaning.
data.Alley.fillna('NAA',inplace=True)
In [732]:
## I am replacing NA to NAA for Alley column of test data,bcz NA is having different levels meaning.
test_data.Alley.fillna('NAA',inplace=True)
In [733]:
#data.Alley.replace(to_replace = np.nan, value ='NAC', inplace=True)
In [734]:
## Check unique values for Alley column of train data.
data.Alley.unique()
Out[734]:
array(['NAA', 'Grvl', 'Pave'], dtype=object)
In [735]:
## Check unique values for Alley column of test data.
test_data.Alley.unique()
Out[735]:
array(['NAA', 'Pave', 'Grvl'], dtype=object)
In [736]:
## Check first 5 records of train data.
data.head()
Out[736]:
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
0 1 60 RL 65.0 8450 Pave NAA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196.0 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 1 0 2 1 3 1 Gd 8 Typ 0 NaN Attchd 2003.0 RFn 2 548 TA TA Y 0 61 0 0 0 0 NaN NaN NaN 0 2 2008 WD Normal 208500
1 2 20 RL 80.0 9600 Pave NAA Reg Lvl AllPub FR2 Gtl Veenker Feedr Norm 1Fam 1Story 6 8 1976 1976 Gable CompShg MetalSd MetalSd None 0.0 TA TA CBlock Gd TA Gd ALQ 978 Unf 0 284 1262 GasA Ex Y SBrkr 1262 0 0 1262 0 1 2 0 3 1 TA 6 Typ 1 TA Attchd 1976.0 RFn 2 460 TA TA Y 298 0 0 0 0 0 NaN NaN NaN 0 5 2007 WD Normal 181500
2 3 60 RL 68.0 11250 Pave NAA IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2001 2002 Gable CompShg VinylSd VinylSd BrkFace 162.0 Gd TA PConc Gd TA Mn GLQ 486 Unf 0 434 920 GasA Ex Y SBrkr 920 866 0 1786 1 0 2 1 3 1 Gd 6 Typ 1 TA Attchd 2001.0 RFn 2 608 TA TA Y 0 42 0 0 0 0 NaN NaN NaN 0 9 2008 WD Normal 223500
3 4 70 RL 60.0 9550 Pave NAA IR1 Lvl AllPub Corner Gtl Crawfor Norm Norm 1Fam 2Story 7 5 1915 1970 Gable CompShg Wd Sdng Wd Shng None 0.0 TA TA BrkTil TA Gd No ALQ 216 Unf 0 540 756 GasA Gd Y SBrkr 961 756 0 1717 1 0 1 0 3 1 Gd 7 Typ 1 Gd Detchd 1998.0 Unf 3 642 TA TA Y 0 35 272 0 0 0 NaN NaN NaN 0 2 2006 WD Abnorml 140000
4 5 60 RL 84.0 14260 Pave NAA IR1 Lvl AllPub FR2 Gtl NoRidge Norm Norm 1Fam 2Story 8 5 2000 2000 Gable CompShg VinylSd VinylSd BrkFace 350.0 Gd TA PConc Gd TA Av GLQ 655 Unf 0 490 1145 GasA Ex Y SBrkr 1145 1053 0 2198 1 0 2 1 4 1 Gd 9 Typ 1 TA Attchd 2000.0 RFn 3 836 TA TA Y 192 84 0 0 0 0 NaN NaN NaN 0 12 2008 WD Normal 250000
In [737]:
## Fill NAs with NB value for BsmtQual column of train data.
#data.BsmtQual.replace(to_replace = np.nan, value ='NB', inplace=True)
data.BsmtQual.fillna('NB',inplace=True)
In [738]:
## Fill NAs with NB value for BsmtQual column of test data.
test_data.BsmtQual.fillna('NB',inplace=True)
In [739]:
## Fill NAs with NB value for BsmtCond column of train data.
#data.BsmtCond.replace(to_replace = np.nan, value ='NB', inplace=True)
data.BsmtCond.fillna('NB',inplace=True)
In [740]:
## Fill NAs with NB value for BsmtCond column of test data.
test_data.BsmtCond.fillna('NB',inplace=True)
In [741]:
## Fill NAs with NB value for BsmtExposure column of train data.
#data.BsmtExposure.replace(to_replace = np.nan, value ='NB', inplace=True)
data.BsmtExposure.fillna('NB',inplace=True)
In [742]:
## Fill NAs with NB value for BsmtExposure column of test data.
test_data.BsmtExposure.fillna('NB',inplace=True)
In [743]:
## Fill NAs with NB value for BsmtFinType1 column of train data.
#data.BsmtFinType1.replace(to_replace = np.nan, value ='NB', inplace=True)
data.BsmtFinType1.fillna('NB',inplace=True)
In [744]:
## Fill NAs with NB value for BsmtFinType1 column of test data.
test_data.BsmtFinType1.fillna('NB',inplace=True)
In [745]:
## Fill NAs with NB value for BsmtFinType2 column of train data.
#data.BsmtFinType2.replace(to_replace = np.nan, value ='NB', inplace=True)
data.BsmtFinType2.fillna('NB',inplace=True)
In [746]:
## Fill NAs with NB value for BsmtFinType2 column of test data.
test_data.BsmtFinType2.fillna('NB',inplace=True)
In [747]:
## Fill NAs with NF value for FireplaceQu column of train data.
#data.FireplaceQu.replace(to_replace = np.nan, value ='NF', inplace=True)
data.FireplaceQu.fillna('NF',inplace=True)
In [748]:
## Fill NAs with NF value for FireplaceQu column of test data.
test_data.FireplaceQu.fillna('NF',inplace=True)
In [749]:
## Fill NAs with NG value for GarageType column of train data.
data.GarageType.fillna('NG',inplace=True)
In [750]:
## Fill NAs with NG value for GarageType column of test data.
test_data.GarageType.fillna('NG',inplace=True)
In [751]:
## Fill NAs with NG value for GarageFinish column of train data.
data.GarageFinish.fillna('NG',inplace=True)
In [752]:
## Fill NAs with NG value for GarageFinish column of test data.
test_data.GarageFinish.fillna('NG',inplace=True)
In [753]:
## Fill NAs with NG value for GarageQual column of train data.
data.GarageQual.fillna('NG',inplace=True)
In [754]:
## Fill NAs with NG value for GarageQual column of test data.
test_data.GarageQual.fillna('NG',inplace=True)
In [755]:
## Fill NAs with NG value for GarageCond column of train data.
data.GarageCond.fillna('NG',inplace=True)
In [756]:
## Fill NAs with NG value for GarageCond column of test data.
test_data.GarageCond.fillna('NG',inplace=True)
In [757]:
## Fill NAs with NP value for PoolQC column of train data.
data.PoolQC.fillna('NP',inplace=True)
In [758]:
## Fill NAs with NP value for PoolQC column of test data.
test_data.PoolQC.fillna('NP',inplace=True)
In [759]:
## Fill NAs with NF value for Fence column of train data.
data.Fence.fillna('NF',inplace=True)
In [760]:
## Fill NAs with NF value for Fence column of test data.
test_data.Fence.fillna('NF',inplace=True)
In [761]:
## Fill NAs with NE value for MiscFeature column of train data.
data.MiscFeature.fillna('NE',inplace=True)
In [762]:
## Fill NAs with NE value for MiscFeature column of test data.
test_data.MiscFeature.fillna('NE',inplace=True)
In [763]:
## Display first record of train data.
data[:1]
Out[763]:
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
0 1 60 RL 65.0 8450 Pave NAA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196.0 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 1 0 2 1 3 1 Gd 8 Typ 0 NF Attchd 2003.0 RFn 2 548 TA TA Y 0 61 0 0 0 0 NP NF NE 0 2 2008 WD Normal 208500
In [764]:
## Display first record of test data.
test_data[:1]
Out[764]:
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition
0 1461 20 RH 80.0 11622 Pave NAA Reg Lvl AllPub Inside Gtl NAmes Feedr Norm 1Fam 1Story 5 6 1961 1961 Gable CompShg VinylSd VinylSd None 0.0 TA TA CBlock TA TA No Rec 468.0 LwQ 144.0 270.0 882.0 GasA TA Y SBrkr 896 0 0 896 0.0 0.0 1 0 2 1 TA 5 Typ 0 NF Attchd 1961.0 Unf 1.0 730.0 TA TA Y 140 0 0 0 120 0 NP MnPrv NE 0 6 2010 WD Normal
In [765]:
## Get summary statistics of train data.
Observations(data)
Out[765]:
dtypes levels null_values Unique Values
Id int64 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... 0 1460
MSSubClass int64 [60, 20, 70, 50, 190, 45, 90, 120, 30, 85, 80,... 0 15
MSZoning object [RL, RM, C (all), FV, RH] 0 5
LotFrontage float64 [65.0, 80.0, 68.0, 60.0, 84.0, 85.0, 75.0, nan... 259 110
LotArea int64 [8450, 9600, 11250, 9550, 14260, 14115, 10084,... 0 1073
Street object [Pave, Grvl] 0 2
Alley object [NAA, Grvl, Pave] 0 3
LotShape object [Reg, IR1, IR2, IR3] 0 4
LandContour object [Lvl, Bnk, Low, HLS] 0 4
Utilities object [AllPub, NoSeWa] 0 2
LotConfig object [Inside, FR2, Corner, CulDSac, FR3] 0 5
LandSlope object [Gtl, Mod, Sev] 0 3
Neighborhood object [CollgCr, Veenker, Crawfor, NoRidge, Mitchel, ... 0 25
Condition1 object [Norm, Feedr, PosN, Artery, RRAe, RRNn, RRAn, ... 0 9
Condition2 object [Norm, Artery, RRNn, Feedr, PosN, PosA, RRAn, ... 0 8
BldgType object [1Fam, 2fmCon, Duplex, TwnhsE, Twnhs] 0 5
HouseStyle object [2Story, 1Story, 1.5Fin, 1.5Unf, SFoyer, SLvl,... 0 8
OverallQual int64 [7, 6, 8, 5, 9, 4, 10, 3, 1, 2] 0 10
OverallCond int64 [5, 8, 6, 7, 4, 2, 3, 9, 1] 0 9
YearBuilt int64 [2003, 1976, 2001, 1915, 2000, 1993, 2004, 197... 0 112
YearRemodAdd int64 [2003, 1976, 2002, 1970, 2000, 1995, 2005, 197... 0 61
RoofStyle object [Gable, Hip, Gambrel, Mansard, Flat, Shed] 0 6
RoofMatl object [CompShg, WdShngl, Metal, WdShake, Membran, Ta... 0 8
Exterior1st object [VinylSd, MetalSd, Wd Sdng, HdBoard, BrkFace, ... 0 15
Exterior2nd object [VinylSd, MetalSd, Wd Shng, HdBoard, Plywood, ... 0 16
MasVnrType object [BrkFace, None, Stone, BrkCmn, nan] 8 4
MasVnrArea float64 [196.0, 0.0, 162.0, 350.0, 186.0, 240.0, 286.0... 8 327
ExterQual object [Gd, TA, Ex, Fa] 0 4
ExterCond object [TA, Gd, Fa, Po, Ex] 0 5
Foundation object [PConc, CBlock, BrkTil, Wood, Slab, Stone] 0 6
BsmtQual object [Gd, TA, Ex, NB, Fa] 0 5
BsmtCond object [TA, Gd, NB, Fa, Po] 0 5
BsmtExposure object [No, Gd, Mn, Av, NB] 0 5
BsmtFinType1 object [GLQ, ALQ, Unf, Rec, BLQ, NB, LwQ] 0 7
BsmtFinSF1 int64 [706, 978, 486, 216, 655, 732, 1369, 859, 0, 8... 0 637
BsmtFinType2 object [Unf, BLQ, NB, ALQ, Rec, LwQ, GLQ] 0 7
BsmtFinSF2 int64 [0, 32, 668, 486, 93, 491, 506, 712, 362, 41, ... 0 144
BsmtUnfSF int64 [150, 284, 434, 540, 490, 64, 317, 216, 952, 1... 0 780
TotalBsmtSF int64 [856, 1262, 920, 756, 1145, 796, 1686, 1107, 9... 0 721
Heating object [GasA, GasW, Grav, Wall, OthW, Floor] 0 6
HeatingQC object [Ex, Gd, TA, Fa, Po] 0 5
CentralAir object [Y, N] 0 2
Electrical object [SBrkr, FuseF, FuseA, FuseP, Mix, nan] 1 5
1stFlrSF int64 [856, 1262, 920, 961, 1145, 796, 1694, 1107, 1... 0 753
2ndFlrSF int64 [854, 0, 866, 756, 1053, 566, 983, 752, 1142, ... 0 417
LowQualFinSF int64 [0, 360, 513, 234, 528, 572, 144, 392, 371, 39... 0 24
GrLivArea int64 [1710, 1262, 1786, 1717, 2198, 1362, 1694, 209... 0 861
BsmtFullBath int64 [1, 0, 2, 3] 0 4
BsmtHalfBath int64 [0, 1, 2] 0 3
FullBath int64 [2, 1, 3, 0] 0 4
HalfBath int64 [1, 0, 2] 0 3
BedroomAbvGr int64 [3, 4, 1, 2, 0, 5, 6, 8] 0 8
KitchenAbvGr int64 [1, 2, 3, 0] 0 4
KitchenQual object [Gd, TA, Ex, Fa] 0 4
TotRmsAbvGrd int64 [8, 6, 7, 9, 5, 11, 4, 10, 12, 3, 2, 14] 0 12
Functional object [Typ, Min1, Maj1, Min2, Mod, Maj2, Sev] 0 7
Fireplaces int64 [0, 1, 2, 3] 0 4
FireplaceQu object [NF, TA, Gd, Fa, Ex, Po] 0 6
GarageType object [Attchd, Detchd, BuiltIn, CarPort, NG, Basment... 0 7
GarageYrBlt float64 [2003.0, 1976.0, 2001.0, 1998.0, 2000.0, 1993.... 81 97
GarageFinish object [RFn, Unf, Fin, NG] 0 4
GarageCars int64 [2, 3, 1, 0, 4] 0 5
GarageArea int64 [548, 460, 608, 642, 836, 480, 636, 484, 468, ... 0 441
GarageQual object [TA, Fa, Gd, NG, Ex, Po] 0 6
GarageCond object [TA, Fa, NG, Gd, Po, Ex] 0 6
PavedDrive object [Y, N, P] 0 3
WoodDeckSF int64 [0, 298, 192, 40, 255, 235, 90, 147, 140, 160,... 0 274
OpenPorchSF int64 [61, 0, 42, 35, 84, 30, 57, 204, 4, 21, 33, 21... 0 202
EnclosedPorch int64 [0, 272, 228, 205, 176, 87, 172, 102, 37, 144,... 0 120
3SsnPorch int64 [0, 320, 407, 130, 180, 168, 140, 508, 238, 24... 0 20
ScreenPorch int64 [0, 176, 198, 291, 252, 99, 184, 168, 130, 142... 0 76
PoolArea int64 [0, 512, 648, 576, 555, 480, 519, 738] 0 8
PoolQC object [NP, Ex, Fa, Gd] 0 4
Fence object [NF, MnPrv, GdWo, GdPrv, MnWw] 0 5
MiscFeature object [NE, Shed, Gar2, Othr, TenC] 0 5
MiscVal int64 [0, 700, 350, 500, 400, 480, 450, 15500, 1200,... 0 21
MoSold int64 [2, 5, 9, 12, 10, 8, 11, 4, 1, 7, 3, 6] 0 12
YrSold int64 [2008, 2007, 2006, 2009, 2010] 0 5
SaleType object [WD, New, COD, ConLD, ConLI, CWD, ConLw, Con, ... 0 9
SaleCondition object [Normal, Abnorml, Partial, AdjLand, Alloca, Fa... 0 6
SalePrice int64 [208500, 181500, 223500, 140000, 250000, 14300... 0 663
In [766]:
## Get summary statistics of test data.
Observations(test_data)
Out[766]:
dtypes levels null_values Unique Values
Id int64 [1461, 1462, 1463, 1464, 1465, 1466, 1467, 146... 0 1459
MSSubClass int64 [20, 60, 120, 160, 80, 30, 50, 90, 85, 190, 45... 0 16
MSZoning object [RH, RL, RM, FV, C (all), nan] 4 5
LotFrontage float64 [80.0, 81.0, 74.0, 78.0, 43.0, 75.0, nan, 63.0... 227 115
LotArea int64 [11622, 14267, 13830, 9978, 5005, 10000, 7980,... 0 1106
Street object [Pave, Grvl] 0 2
Alley object [NAA, Pave, Grvl] 0 3
LotShape object [Reg, IR1, IR2, IR3] 0 4
LandContour object [Lvl, HLS, Bnk, Low] 0 4
Utilities object [AllPub, nan] 2 1
LotConfig object [Inside, Corner, FR2, CulDSac, FR3] 0 5
LandSlope object [Gtl, Mod, Sev] 0 3
Neighborhood object [NAmes, Gilbert, StoneBr, BrDale, NPkVill, Nri... 0 25
Condition1 object [Feedr, Norm, PosN, RRNe, Artery, RRNn, PosA, ... 0 9
Condition2 object [Norm, Feedr, PosA, PosN, Artery] 0 5
BldgType object [1Fam, TwnhsE, Twnhs, Duplex, 2fmCon] 0 5
HouseStyle object [1Story, 2Story, SLvl, 1.5Fin, SFoyer, 2.5Unf,... 0 7
OverallQual int64 [5, 6, 8, 7, 4, 9, 2, 3, 10, 1] 0 10
OverallCond int64 [6, 5, 7, 8, 2, 9, 3, 4, 1] 0 9
YearBuilt int64 [1961, 1958, 1997, 1998, 1992, 1993, 1990, 197... 0 106
YearRemodAdd int64 [1961, 1958, 1998, 1992, 1994, 2007, 1990, 197... 0 61
RoofStyle object [Gable, Hip, Gambrel, Flat, Mansard, Shed] 0 6
RoofMatl object [CompShg, Tar&Grv, WdShake, WdShngl] 0 4
Exterior1st object [VinylSd, Wd Sdng, HdBoard, Plywood, MetalSd, ... 1 13
Exterior2nd object [VinylSd, Wd Sdng, HdBoard, Plywood, MetalSd, ... 1 15
MasVnrType object [None, BrkFace, Stone, BrkCmn, nan] 16 4
MasVnrArea float64 [0.0, 108.0, 20.0, 504.0, 492.0, 162.0, 256.0,... 15 303
ExterQual object [TA, Gd, Ex, Fa] 0 4
ExterCond object [TA, Gd, Fa, Po, Ex] 0 5
Foundation object [CBlock, PConc, BrkTil, Stone, Slab, Wood] 0 6
BsmtQual object [TA, Gd, Ex, Fa, NB] 0 5
BsmtCond object [TA, Po, Fa, Gd, NB] 0 5
BsmtExposure object [No, Gd, Mn, Av, NB] 0 5
BsmtFinType1 object [Rec, ALQ, GLQ, Unf, BLQ, LwQ, NB] 0 7
BsmtFinSF1 float64 [468.0, 923.0, 791.0, 602.0, 263.0, 0.0, 935.0... 1 669
BsmtFinType2 object [LwQ, Unf, Rec, BLQ, GLQ, ALQ, NB] 0 7
BsmtFinSF2 float64 [144.0, 0.0, 78.0, 859.0, 981.0, 42.0, 46.0, 1... 1 161
BsmtUnfSF float64 [270.0, 406.0, 137.0, 324.0, 1017.0, 763.0, 23... 1 793
TotalBsmtSF float64 [882.0, 1329.0, 928.0, 926.0, 1280.0, 763.0, 1... 1 736
Heating object [GasA, GasW, Grav, Wall] 0 4
HeatingQC object [TA, Gd, Ex, Fa, Po] 0 5
CentralAir object [Y, N] 0 2
Electrical object [SBrkr, FuseA, FuseF, FuseP] 0 4
1stFlrSF int64 [896, 1329, 928, 926, 1280, 763, 1187, 789, 13... 0 789
2ndFlrSF int64 [0, 701, 678, 892, 676, 504, 567, 601, 707, 56... 0 407
LowQualFinSF int64 [0, 362, 1064, 431, 436, 259, 312, 108, 697, 5... 0 15
GrLivArea int64 [896, 1329, 1629, 1604, 1280, 1655, 1187, 1465... 0 879
BsmtFullBath float64 [0.0, 1.0, 2.0, 3.0, nan] 2 4
BsmtHalfBath float64 [0.0, 1.0, nan, 2.0] 2 3
FullBath int64 [1, 2, 3, 4, 0] 0 5
HalfBath int64 [0, 1, 2] 0 3
BedroomAbvGr int64 [2, 3, 4, 1, 6, 5, 0] 0 7
KitchenAbvGr int64 [1, 2, 0] 0 3
KitchenQual object [TA, Gd, Ex, Fa, nan] 1 4
TotRmsAbvGrd int64 [5, 6, 7, 4, 10, 8, 9, 3, 12, 11, 13, 15] 0 12
Functional object [Typ, Min2, Min1, Mod, Maj1, Sev, Maj2, nan] 2 7
Fireplaces int64 [0, 1, 2, 3, 4] 0 5
FireplaceQu object [NF, TA, Gd, Po, Fa, Ex] 0 6
GarageType object [Attchd, Detchd, BuiltIn, NG, Basment, 2Types,... 0 7
GarageYrBlt float64 [1961.0, 1958.0, 1997.0, 1998.0, 1992.0, 1993.... 78 97
GarageFinish object [Unf, Fin, RFn, NG] 0 4
GarageCars float64 [1.0, 2.0, 3.0, 0.0, 4.0, 5.0, nan] 1 6
GarageArea float64 [730.0, 312.0, 482.0, 470.0, 506.0, 440.0, 420... 1 459
GarageQual object [TA, NG, Fa, Gd, Po] 0 5
GarageCond object [TA, NG, Fa, Gd, Po, Ex] 0 6
PavedDrive object [Y, N, P] 0 3
WoodDeckSF int64 [140, 393, 212, 360, 0, 157, 483, 192, 240, 20... 0 263
OpenPorchSF int64 [0, 36, 34, 82, 84, 21, 75, 68, 30, 133, 35, 7... 0 203
EnclosedPorch int64 [0, 80, 186, 120, 150, 205, 113, 135, 126, 334... 0 131
3SsnPorch int64 [0, 224, 255, 225, 360, 150, 153, 174, 120, 21... 0 13
ScreenPorch int64 [120, 0, 144, 256, 216, 204, 160, 240, 148, 16... 0 75
PoolArea int64 [0, 144, 368, 444, 228, 561, 800] 0 7
PoolQC object [NP, Ex, Gd] 0 3
Fence object [MnPrv, NF, GdPrv, GdWo, MnWw] 0 5
MiscFeature object [NE, Gar2, Shed, Othr] 0 4
MiscVal int64 [0, 12500, 500, 1500, 300, 450, 80, 600, 490, ... 0 26
MoSold int64 [6, 3, 1, 4, 5, 2, 7, 10, 8, 11, 9, 12] 0 12
YrSold int64 [2010, 2009, 2008, 2007, 2006] 0 5
SaleType object [WD, COD, New, ConLD, Oth, Con, ConLw, ConLI, ... 1 9
SaleCondition object [Normal, Partial, Abnorml, Family, Alloca, Adj... 0 6
In [767]:
## Get data type of MSZoning column.
data.MSZoning.dtypes
Out[767]:
dtype('O')
In [771]:
## Fetch category,object data types columns from train data.
object_columns = data.select_dtypes(include=['object','category'])
In [772]:
## Fetch category,object data types columns from test data.
test_object_columns = test_data.select_dtypes(include=['object','category'])
In [773]:
## Display first records of category & object columns of train data.
object_columns.head()
Out[773]:
MSZoning Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinType2 Heating HeatingQC CentralAir Electrical KitchenQual Functional FireplaceQu GarageType GarageFinish GarageQual GarageCond PavedDrive PoolQC Fence MiscFeature SaleType SaleCondition
0 RL Pave NAA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story Gable CompShg VinylSd VinylSd BrkFace Gd TA PConc Gd TA No GLQ Unf GasA Ex Y SBrkr Gd Typ NF Attchd RFn TA TA Y NP NF NE WD Normal
1 RL Pave NAA Reg Lvl AllPub FR2 Gtl Veenker Feedr Norm 1Fam 1Story Gable CompShg MetalSd MetalSd None TA TA CBlock Gd TA Gd ALQ Unf GasA Ex Y SBrkr TA Typ TA Attchd RFn TA TA Y NP NF NE WD Normal
2 RL Pave NAA IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story Gable CompShg VinylSd VinylSd BrkFace Gd TA PConc Gd TA Mn GLQ Unf GasA Ex Y SBrkr Gd Typ TA Attchd RFn TA TA Y NP NF NE WD Normal
3 RL Pave NAA IR1 Lvl AllPub Corner Gtl Crawfor Norm Norm 1Fam 2Story Gable CompShg Wd Sdng Wd Shng None TA TA BrkTil TA Gd No ALQ Unf GasA Gd Y SBrkr Gd Typ Gd Detchd Unf TA TA Y NP NF NE WD Abnorml
4 RL Pave NAA IR1 Lvl AllPub FR2 Gtl NoRidge Norm Norm 1Fam 2Story Gable CompShg VinylSd VinylSd BrkFace Gd TA PConc Gd TA Av GLQ Unf GasA Ex Y SBrkr Gd Typ TA Attchd RFn TA TA Y NP NF NE WD Normal
In [774]:
## Display first records of category & object columns of test data.
test_object_columns.head()
Out[774]:
MSZoning Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinType2 Heating HeatingQC CentralAir Electrical KitchenQual Functional FireplaceQu GarageType GarageFinish GarageQual GarageCond PavedDrive PoolQC Fence MiscFeature SaleType SaleCondition
0 RH Pave NAA Reg Lvl AllPub Inside Gtl NAmes Feedr Norm 1Fam 1Story Gable CompShg VinylSd VinylSd None TA TA CBlock TA TA No Rec LwQ GasA TA Y SBrkr TA Typ NF Attchd Unf TA TA Y NP MnPrv NE WD Normal
1 RL Pave NAA IR1 Lvl AllPub Corner Gtl NAmes Norm Norm 1Fam 1Story Hip CompShg Wd Sdng Wd Sdng BrkFace TA TA CBlock TA TA No ALQ Unf GasA TA Y SBrkr Gd Typ NF Attchd Unf TA TA Y NP NF Gar2 WD Normal
2 RL Pave NAA IR1 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story Gable CompShg VinylSd VinylSd None TA TA PConc Gd TA No GLQ Unf GasA Gd Y SBrkr TA Typ TA Attchd Fin TA TA Y NP MnPrv NE WD Normal
3 RL Pave NAA IR1 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story Gable CompShg VinylSd VinylSd BrkFace TA TA PConc TA TA No GLQ Unf GasA Ex Y SBrkr Gd Typ Gd Attchd Fin TA TA Y NP NF NE WD Normal
4 RL Pave NAA IR1 HLS AllPub Inside Gtl StoneBr Norm Norm TwnhsE 1Story Gable CompShg HdBoard HdBoard None Gd TA PConc Gd TA No ALQ Unf GasA Ex Y SBrkr Gd Typ NF Attchd RFn TA TA Y NP NF NE WD Normal
In [775]:
## Convert object to category data type.
for col in object_columns.columns:
    data[col] = data[col].astype('str').astype('category')
In [776]:
## Get columns data types of train data.
data.dtypes
Out[776]:
Id                  int64
MSSubClass          int64
MSZoning         category
LotFrontage       float64
LotArea             int64
Street           category
Alley            category
LotShape         category
LandContour      category
Utilities        category
LotConfig        category
LandSlope        category
Neighborhood     category
Condition1       category
Condition2       category
BldgType         category
HouseStyle       category
OverallQual         int64
OverallCond         int64
YearBuilt           int64
YearRemodAdd        int64
RoofStyle        category
RoofMatl         category
Exterior1st      category
Exterior2nd      category
MasVnrType       category
MasVnrArea        float64
ExterQual        category
ExterCond        category
Foundation       category
BsmtQual         category
BsmtCond         category
BsmtExposure     category
BsmtFinType1     category
BsmtFinSF1          int64
BsmtFinType2     category
BsmtFinSF2          int64
BsmtUnfSF           int64
TotalBsmtSF         int64
Heating          category
HeatingQC        category
CentralAir       category
Electrical       category
1stFlrSF            int64
2ndFlrSF            int64
LowQualFinSF        int64
GrLivArea           int64
BsmtFullBath        int64
BsmtHalfBath        int64
FullBath            int64
HalfBath            int64
BedroomAbvGr        int64
KitchenAbvGr        int64
KitchenQual      category
TotRmsAbvGrd        int64
Functional       category
Fireplaces          int64
FireplaceQu      category
GarageType       category
GarageYrBlt       float64
GarageFinish     category
GarageCars          int64
GarageArea          int64
GarageQual       category
GarageCond       category
PavedDrive       category
WoodDeckSF          int64
OpenPorchSF         int64
EnclosedPorch       int64
3SsnPorch           int64
ScreenPorch         int64
PoolArea            int64
PoolQC           category
Fence            category
MiscFeature      category
MiscVal             int64
MoSold              int64
YrSold              int64
SaleType         category
SaleCondition    category
SalePrice           int64
dtype: object
In [777]:
## Convert object data typpe to category.
for col in test_object_columns.columns:
    test_data[col] = test_data[col].astype('str').astype('category')
In [778]:
## Get column data types of test data.
test_data.dtypes
Out[778]:
Id                  int64
MSSubClass          int64
MSZoning         category
LotFrontage       float64
LotArea             int64
Street           category
Alley            category
LotShape         category
LandContour      category
Utilities        category
LotConfig        category
LandSlope        category
Neighborhood     category
Condition1       category
Condition2       category
BldgType         category
HouseStyle       category
OverallQual         int64
OverallCond         int64
YearBuilt           int64
YearRemodAdd        int64
RoofStyle        category
RoofMatl         category
Exterior1st      category
Exterior2nd      category
MasVnrType       category
MasVnrArea        float64
ExterQual        category
ExterCond        category
Foundation       category
BsmtQual         category
BsmtCond         category
BsmtExposure     category
BsmtFinType1     category
BsmtFinSF1        float64
BsmtFinType2     category
BsmtFinSF2        float64
BsmtUnfSF         float64
TotalBsmtSF       float64
Heating          category
HeatingQC        category
CentralAir       category
Electrical       category
1stFlrSF            int64
2ndFlrSF            int64
LowQualFinSF        int64
GrLivArea           int64
BsmtFullBath      float64
BsmtHalfBath      float64
FullBath            int64
HalfBath            int64
BedroomAbvGr        int64
KitchenAbvGr        int64
KitchenQual      category
TotRmsAbvGrd        int64
Functional       category
Fireplaces          int64
FireplaceQu      category
GarageType       category
GarageYrBlt       float64
GarageFinish     category
GarageCars        float64
GarageArea        float64
GarageQual       category
GarageCond       category
PavedDrive       category
WoodDeckSF          int64
OpenPorchSF         int64
EnclosedPorch       int64
3SsnPorch           int64
ScreenPorch         int64
PoolArea            int64
PoolQC           category
Fence            category
MiscFeature      category
MiscVal             int64
MoSold              int64
YrSold              int64
SaleType         category
SaleCondition    category
dtype: object
In [779]:
## Convert numeric columns into categorical varibles
cols = ['MSSubClass','OverallQual','OverallCond']
for col in cols:
    data[col] = data[col].astype('str').astype('category')
In [780]:
## Convert numeric data types to categorical data types.
for col in cols:
    test_data[col] = test_data[col].astype('str').astype('category')
In [781]:
## Get columns data types of train data.
data.dtypes
Out[781]:
Id                  int64
MSSubClass       category
MSZoning         category
LotFrontage       float64
LotArea             int64
Street           category
Alley            category
LotShape         category
LandContour      category
Utilities        category
LotConfig        category
LandSlope        category
Neighborhood     category
Condition1       category
Condition2       category
BldgType         category
HouseStyle       category
OverallQual      category
OverallCond      category
YearBuilt           int64
YearRemodAdd        int64
RoofStyle        category
RoofMatl         category
Exterior1st      category
Exterior2nd      category
MasVnrType       category
MasVnrArea        float64
ExterQual        category
ExterCond        category
Foundation       category
BsmtQual         category
BsmtCond         category
BsmtExposure     category
BsmtFinType1     category
BsmtFinSF1          int64
BsmtFinType2     category
BsmtFinSF2          int64
BsmtUnfSF           int64
TotalBsmtSF         int64
Heating          category
HeatingQC        category
CentralAir       category
Electrical       category
1stFlrSF            int64
2ndFlrSF            int64
LowQualFinSF        int64
GrLivArea           int64
BsmtFullBath        int64
BsmtHalfBath        int64
FullBath            int64
HalfBath            int64
BedroomAbvGr        int64
KitchenAbvGr        int64
KitchenQual      category
TotRmsAbvGrd        int64
Functional       category
Fireplaces          int64
FireplaceQu      category
GarageType       category
GarageYrBlt       float64
GarageFinish     category
GarageCars          int64
GarageArea          int64
GarageQual       category
GarageCond       category
PavedDrive       category
WoodDeckSF          int64
OpenPorchSF         int64
EnclosedPorch       int64
3SsnPorch           int64
ScreenPorch         int64
PoolArea            int64
PoolQC           category
Fence            category
MiscFeature      category
MiscVal             int64
MoSold              int64
YrSold              int64
SaleType         category
SaleCondition    category
SalePrice           int64
dtype: object
In [782]:
## Get columns data types of test data.
test_data.dtypes
Out[782]:
Id                  int64
MSSubClass       category
MSZoning         category
LotFrontage       float64
LotArea             int64
Street           category
Alley            category
LotShape         category
LandContour      category
Utilities        category
LotConfig        category
LandSlope        category
Neighborhood     category
Condition1       category
Condition2       category
BldgType         category
HouseStyle       category
OverallQual      category
OverallCond      category
YearBuilt           int64
YearRemodAdd        int64
RoofStyle        category
RoofMatl         category
Exterior1st      category
Exterior2nd      category
MasVnrType       category
MasVnrArea        float64
ExterQual        category
ExterCond        category
Foundation       category
BsmtQual         category
BsmtCond         category
BsmtExposure     category
BsmtFinType1     category
BsmtFinSF1        float64
BsmtFinType2     category
BsmtFinSF2        float64
BsmtUnfSF         float64
TotalBsmtSF       float64
Heating          category
HeatingQC        category
CentralAir       category
Electrical       category
1stFlrSF            int64
2ndFlrSF            int64
LowQualFinSF        int64
GrLivArea           int64
BsmtFullBath      float64
BsmtHalfBath      float64
FullBath            int64
HalfBath            int64
BedroomAbvGr        int64
KitchenAbvGr        int64
KitchenQual      category
TotRmsAbvGrd        int64
Functional       category
Fireplaces          int64
FireplaceQu      category
GarageType       category
GarageYrBlt       float64
GarageFinish     category
GarageCars        float64
GarageArea        float64
GarageQual       category
GarageCond       category
PavedDrive       category
WoodDeckSF          int64
OpenPorchSF         int64
EnclosedPorch       int64
3SsnPorch           int64
ScreenPorch         int64
PoolArea            int64
PoolQC           category
Fence            category
MiscFeature      category
MiscVal             int64
MoSold              int64
YrSold              int64
SaleType         category
SaleCondition    category
dtype: object
In [783]:
## Set index for train data and display first 5 records.
data = data.set_index('Id')
data.head()
Out[783]:
MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
Id
1 60 RL 65.0 8450 Pave NAA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196.0 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 1 0 2 1 3 1 Gd 8 Typ 0 NF Attchd 2003.0 RFn 2 548 TA TA Y 0 61 0 0 0 0 NP NF NE 0 2 2008 WD Normal 208500
2 20 RL 80.0 9600 Pave NAA Reg Lvl AllPub FR2 Gtl Veenker Feedr Norm 1Fam 1Story 6 8 1976 1976 Gable CompShg MetalSd MetalSd None 0.0 TA TA CBlock Gd TA Gd ALQ 978 Unf 0 284 1262 GasA Ex Y SBrkr 1262 0 0 1262 0 1 2 0 3 1 TA 6 Typ 1 TA Attchd 1976.0 RFn 2 460 TA TA Y 298 0 0 0 0 0 NP NF NE 0 5 2007 WD Normal 181500
3 60 RL 68.0 11250 Pave NAA IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2001 2002 Gable CompShg VinylSd VinylSd BrkFace 162.0 Gd TA PConc Gd TA Mn GLQ 486 Unf 0 434 920 GasA Ex Y SBrkr 920 866 0 1786 1 0 2 1 3 1 Gd 6 Typ 1 TA Attchd 2001.0 RFn 2 608 TA TA Y 0 42 0 0 0 0 NP NF NE 0 9 2008 WD Normal 223500
4 70 RL 60.0 9550 Pave NAA IR1 Lvl AllPub Corner Gtl Crawfor Norm Norm 1Fam 2Story 7 5 1915 1970 Gable CompShg Wd Sdng Wd Shng None 0.0 TA TA BrkTil TA Gd No ALQ 216 Unf 0 540 756 GasA Gd Y SBrkr 961 756 0 1717 1 0 1 0 3 1 Gd 7 Typ 1 Gd Detchd 1998.0 Unf 3 642 TA TA Y 0 35 272 0 0 0 NP NF NE 0 2 2006 WD Abnorml 140000
5 60 RL 84.0 14260 Pave NAA IR1 Lvl AllPub FR2 Gtl NoRidge Norm Norm 1Fam 2Story 8 5 2000 2000 Gable CompShg VinylSd VinylSd BrkFace 350.0 Gd TA PConc Gd TA Av GLQ 655 Unf 0 490 1145 GasA Ex Y SBrkr 1145 1053 0 2198 1 0 2 1 4 1 Gd 9 Typ 1 TA Attchd 2000.0 RFn 3 836 TA TA Y 192 84 0 0 0 0 NP NF NE 0 12 2008 WD Normal 250000
In [784]:
## Set index for test data and display first 5 records.
test_data = test_data.set_index('Id')
test_data.head()
Out[784]:
MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition
Id
1461 20 RH 80.0 11622 Pave NAA Reg Lvl AllPub Inside Gtl NAmes Feedr Norm 1Fam 1Story 5 6 1961 1961 Gable CompShg VinylSd VinylSd None 0.0 TA TA CBlock TA TA No Rec 468.0 LwQ 144.0 270.0 882.0 GasA TA Y SBrkr 896 0 0 896 0.0 0.0 1 0 2 1 TA 5 Typ 0 NF Attchd 1961.0 Unf 1.0 730.0 TA TA Y 140 0 0 0 120 0 NP MnPrv NE 0 6 2010 WD Normal
1462 20 RL 81.0 14267 Pave NAA IR1 Lvl AllPub Corner Gtl NAmes Norm Norm 1Fam 1Story 6 6 1958 1958 Hip CompShg Wd Sdng Wd Sdng BrkFace 108.0 TA TA CBlock TA TA No ALQ 923.0 Unf 0.0 406.0 1329.0 GasA TA Y SBrkr 1329 0 0 1329 0.0 0.0 1 1 3 1 Gd 6 Typ 0 NF Attchd 1958.0 Unf 1.0 312.0 TA TA Y 393 36 0 0 0 0 NP NF Gar2 12500 6 2010 WD Normal
1463 60 RL 74.0 13830 Pave NAA IR1 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story 5 5 1997 1998 Gable CompShg VinylSd VinylSd None 0.0 TA TA PConc Gd TA No GLQ 791.0 Unf 0.0 137.0 928.0 GasA Gd Y SBrkr 928 701 0 1629 0.0 0.0 2 1 3 1 TA 6 Typ 1 TA Attchd 1997.0 Fin 2.0 482.0 TA TA Y 212 34 0 0 0 0 NP MnPrv NE 0 3 2010 WD Normal
1464 60 RL 78.0 9978 Pave NAA IR1 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story 6 6 1998 1998 Gable CompShg VinylSd VinylSd BrkFace 20.0 TA TA PConc TA TA No GLQ 602.0 Unf 0.0 324.0 926.0 GasA Ex Y SBrkr 926 678 0 1604 0.0 0.0 2 1 3 1 Gd 7 Typ 1 Gd Attchd 1998.0 Fin 2.0 470.0 TA TA Y 360 36 0 0 0 0 NP NF NE 0 6 2010 WD Normal
1465 120 RL 43.0 5005 Pave NAA IR1 HLS AllPub Inside Gtl StoneBr Norm Norm TwnhsE 1Story 8 5 1992 1992 Gable CompShg HdBoard HdBoard None 0.0 Gd TA PConc Gd TA No ALQ 263.0 Unf 0.0 1017.0 1280.0 GasA Ex Y SBrkr 1280 0 0 1280 0.0 0.0 2 0 2 1 Gd 5 Typ 0 NF Attchd 1992.0 RFn 2.0 506.0 TA TA Y 0 82 0 0 144 0 NP NF NE 0 1 2010 WD Normal
In [785]:
## Seperate numeric and categorical columns for train data.
cat_columns = data.select_dtypes(include=['category'])
num_columns = data.select_dtypes(include=['int64', 'float64'])
In [786]:
## Seperate numeric and categorical columns for test data.
test_cat_columns = test_data.select_dtypes(include=['category'])
test_num_columns = test_data.select_dtypes(include=['int64', 'float64'])
In [787]:
## Get unique values for BsmtFullBath column of train data.
data.BsmtFullBath.nunique()
Out[787]:
4
In [788]:
## Below logic is used for checking special charcter in numeric columns(Train data).

for col in num_columns.columns:
    
    print('\n',col,'----->')
    for index in range(1,len(data)):
        try:
            skip=float(data.loc[index,col])
            skip=int(data.loc[index,col])
        except ValueError :
            print(index,data.loc[index,col])
            
 LotFrontage ----->
8 nan
13 nan
15 nan
17 nan
25 nan
32 nan
43 nan
44 nan
51 nan
65 nan
67 nan
77 nan
85 nan
96 nan
101 nan
105 nan
112 nan
114 nan
117 nan
121 nan
127 nan
132 nan
134 nan
137 nan
148 nan
150 nan
153 nan
154 nan
161 nan
167 nan
170 nan
171 nan
178 nan
181 nan
187 nan
192 nan
204 nan
208 nan
209 nan
215 nan
219 nan
222 nan
235 nan
238 nan
245 nan
250 nan
270 nan
288 nan
289 nan
294 nan
308 nan
309 nan
311 nan
320 nan
329 nan
331 nan
336 nan
343 nan
347 nan
348 nan
352 nan
357 nan
361 nan
362 nan
365 nan
367 nan
370 nan
371 nan
376 nan
385 nan
393 nan
394 nan
405 nan
406 nan
413 nan
422 nan
427 nan
448 nan
453 nan
458 nan
459 nan
460 nan
466 nan
471 nan
485 nan
491 nan
497 nan
517 nan
519 nan
530 nan
538 nan
539 nan
540 nan
542 nan
546 nan
560 nan
561 nan
565 nan
570 nan
581 nan
594 nan
611 nan
612 nan
613 nan
617 nan
624 nan
627 nan
642 nan
646 nan
661 nan
667 nan
669 nan
673 nan
680 nan
683 nan
686 nan
688 nan
691 nan
707 nan
710 nan
715 nan
721 nan
722 nan
727 nan
735 nan
746 nan
747 nan
752 nan
758 nan
771 nan
784 nan
786 nan
790 nan
792 nan
795 nan
812 nan
817 nan
818 nan
823 nan
829 nan
841 nan
846 nan
852 nan
854 nan
856 nan
857 nan
860 nan
866 nan
869 nan
880 nan
883 nan
894 nan
901 nan
905 nan
909 nan
912 nan
918 nan
926 nan
928 nan
929 nan
930 nan
940 nan
942 nan
945 nan
954 nan
962 nan
968 nan
976 nan
981 nan
984 nan
989 nan
997 nan
998 nan
1004 nan
1007 nan
1018 nan
1019 nan
1025 nan
1031 nan
1033 nan
1034 nan
1036 nan
1038 nan
1042 nan
1046 nan
1058 nan
1060 nan
1065 nan
1078 nan
1085 nan
1087 nan
1098 nan
1109 nan
1111 nan
1117 nan
1123 nan
1125 nan
1139 nan
1142 nan
1144 nan
1147 nan
1149 nan
1154 nan
1155 nan
1162 nan
1165 nan
1178 nan
1181 nan
1191 nan
1194 nan
1207 nan
1214 nan
1231 nan
1234 nan
1245 nan
1248 nan
1252 nan
1254 nan
1261 nan
1263 nan
1269 nan
1271 nan
1272 nan
1273 nan
1277 nan
1278 nan
1287 nan
1288 nan
1291 nan
1301 nan
1302 nan
1310 nan
1313 nan
1319 nan
1322 nan
1343 nan
1347 nan
1349 nan
1355 nan
1357 nan
1358 nan
1359 nan
1363 nan
1366 nan
1369 nan
1374 nan
1382 nan
1384 nan
1397 nan
1408 nan
1418 nan
1420 nan
1424 nan
1425 nan
1430 nan
1432 nan
1442 nan
1444 nan
1447 nan

 LotArea ----->

 YearBuilt ----->

 YearRemodAdd ----->

 MasVnrArea ----->
235 nan
530 nan
651 nan
937 nan
974 nan
978 nan
1244 nan
1279 nan

 BsmtFinSF1 ----->

 BsmtFinSF2 ----->

 BsmtUnfSF ----->

 TotalBsmtSF ----->

 1stFlrSF ----->

 2ndFlrSF ----->

 LowQualFinSF ----->

 GrLivArea ----->

 BsmtFullBath ----->

 BsmtHalfBath ----->

 FullBath ----->

 HalfBath ----->

 BedroomAbvGr ----->

 KitchenAbvGr ----->

 TotRmsAbvGrd ----->

 Fireplaces ----->

 GarageYrBlt ----->
40 nan
49 nan
79 nan
89 nan
90 nan
100 nan
109 nan
126 nan
128 nan
141 nan
149 nan
156 nan
164 nan
166 nan
199 nan
211 nan
242 nan
251 nan
288 nan
292 nan
308 nan
376 nan
387 nan
394 nan
432 nan
435 nan
442 nan
465 nan
496 nan
521 nan
529 nan
534 nan
536 nan
563 nan
583 nan
614 nan
615 nan
621 nan
636 nan
637 nan
639 nan
650 nan
706 nan
711 nan
739 nan
751 nan
785 nan
827 nan
844 nan
922 nan
943 nan
955 nan
961 nan
969 nan
971 nan
977 nan
1010 nan
1012 nan
1031 nan
1039 nan
1097 nan
1124 nan
1132 nan
1138 nan
1144 nan
1174 nan
1180 nan
1219 nan
1220 nan
1235 nan
1258 nan
1284 nan
1324 nan
1326 nan
1327 nan
1338 nan
1350 nan
1408 nan
1450 nan
1451 nan
1454 nan

 GarageCars ----->

 GarageArea ----->

 WoodDeckSF ----->

 OpenPorchSF ----->

 EnclosedPorch ----->

 3SsnPorch ----->

 ScreenPorch ----->

 PoolArea ----->

 MiscVal ----->

 MoSold ----->

 YrSold ----->

 SalePrice ----->
In [790]:
## Display train data numeric columns.
num_columns.columns
Out[790]:
Index(['LotFrontage', 'LotArea', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea',
       'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF',
       '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath',
       'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd',
       'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF',
       'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea',
       'MiscVal', 'MoSold', 'YrSold', 'SalePrice'],
      dtype='object')
In [791]:
## Display test data numeric columns.
test_num_columns.columns
Out[791]:
Index(['LotFrontage', 'LotArea', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea',
       'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF',
       '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath',
       'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd',
       'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF',
       'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea',
       'MiscVal', 'MoSold', 'YrSold'],
      dtype='object')
In [793]:
## Check corrlation between numeric columns of train data.
data[num_columns.columns].corr()
Out[793]:
LotFrontage LotArea YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr TotRmsAbvGrd Fireplaces GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold SalePrice
LotFrontage 1.000000 0.426095 0.123349 0.088866 0.193458 0.233633 0.049900 0.132644 0.392075 0.457181 0.080177 0.038469 0.402797 0.100949 -0.007234 0.198769 0.053532 0.263170 -0.006069 0.352096 0.266639 0.070250 0.285691 0.344997 0.088521 0.151972 0.010700 0.070029 0.041383 0.206167 0.003368 0.011200 0.007450 0.351799
LotArea 0.426095 1.000000 0.014228 0.013788 0.104160 0.214103 0.111170 -0.002618 0.260833 0.299475 0.050986 0.004779 0.263116 0.158155 0.048046 0.126031 0.014259 0.119690 -0.017784 0.190015 0.271364 -0.024947 0.154871 0.180403 0.171698 0.084774 -0.018340 0.020423 0.043160 0.077672 0.038068 0.001205 -0.014261 0.263843
YearBuilt 0.123349 0.014228 1.000000 0.592855 0.315707 0.249503 -0.049107 0.149040 0.391452 0.281986 0.010308 -0.183784 0.199010 0.187599 -0.038162 0.468271 0.242656 -0.070651 -0.174800 0.095589 0.147716 0.825667 0.537850 0.478954 0.224880 0.188686 -0.387268 0.031355 -0.050364 0.004950 -0.034383 0.012398 -0.013618 0.522897
YearRemodAdd 0.088866 0.013788 0.592855 1.000000 0.179618 0.128451 -0.067759 0.181133 0.291066 0.240379 0.140024 -0.062419 0.287389 0.119470 -0.012337 0.439046 0.183331 -0.040581 -0.149598 0.191740 0.112581 0.642277 0.420622 0.371600 0.205726 0.226298 -0.193919 0.045286 -0.038740 0.005829 -0.010286 0.021490 0.035743 0.507101
MasVnrArea 0.193458 0.104160 0.315707 0.179618 1.000000 0.264736 -0.072319 0.114442 0.363936 0.344501 0.174561 -0.069071 0.390857 0.085310 0.026673 0.276833 0.201444 0.102821 -0.037610 0.280682 0.249070 0.252691 0.364204 0.373066 0.159718 0.125703 -0.110204 0.018796 0.061466 0.011723 -0.029815 -0.005965 -0.008201 0.477493
BsmtFinSF1 0.233633 0.214103 0.249503 0.128451 0.264736 1.000000 -0.050117 -0.495251 0.522396 0.445863 -0.137079 -0.064503 0.208171 0.649212 0.067418 0.058543 0.004262 -0.107355 -0.081007 0.044316 0.260011 0.153484 0.224054 0.296970 0.204306 0.111761 -0.102303 0.026451 0.062021 0.140491 0.003571 -0.015727 0.014359 0.386420
BsmtFinSF2 0.049900 0.111170 -0.049107 -0.067759 -0.072319 -0.050117 1.000000 -0.209294 0.104810 0.097117 -0.099260 0.014807 -0.009640 0.158678 0.070948 -0.076444 -0.032148 -0.015728 -0.040751 -0.035227 0.046921 -0.088011 -0.038264 -0.018227 0.067898 0.003093 0.036543 -0.029993 0.088871 0.041709 0.004940 -0.015211 0.031706 -0.011378
BsmtUnfSF 0.132644 -0.002618 0.149040 0.181133 0.114442 -0.495251 -0.209294 1.000000 0.415360 0.317987 0.004469 0.028167 0.240257 -0.422900 -0.095804 0.288886 -0.041118 0.166643 0.030086 0.250647 0.051575 0.190708 0.214175 0.183303 -0.005316 0.129005 -0.002538 0.020764 -0.012579 -0.035092 -0.023837 0.034888 -0.041258 0.214479
TotalBsmtSF 0.392075 0.260833 0.391452 0.291066 0.363936 0.522396 0.104810 0.415360 1.000000 0.819530 -0.174512 -0.033245 0.454868 0.307351 -0.000315 0.323722 -0.048804 0.050450 -0.068901 0.285573 0.339519 0.322445 0.434585 0.486665 0.232019 0.247264 -0.095478 0.037384 0.084489 0.126053 -0.018479 0.013196 -0.014969 0.613581
1stFlrSF 0.457181 0.299475 0.281986 0.240379 0.344501 0.445863 0.097117 0.317987 0.819530 1.000000 -0.202646 -0.014241 0.566024 0.244671 0.001956 0.380637 -0.119916 0.127401 0.068101 0.409516 0.410531 0.233449 0.439317 0.489782 0.235459 0.211671 -0.065292 0.056104 0.088758 0.131525 -0.021096 0.031372 -0.013604 0.605852
2ndFlrSF 0.080177 0.050986 0.010308 0.140024 0.174561 -0.137079 -0.099260 0.004469 -0.174512 -0.202646 1.000000 0.063353 0.687501 -0.169494 -0.023855 0.421378 0.609707 0.502901 0.059306 0.616423 0.194561 0.070832 0.183926 0.138347 0.092165 0.208026 0.061989 -0.024358 0.040606 0.081487 0.016197 0.035164 -0.028700 0.319334
LowQualFinSF 0.038469 0.004779 -0.183784 -0.062419 -0.069071 -0.064503 0.014807 0.028167 -0.033245 -0.014241 0.063353 1.000000 0.134683 -0.047143 -0.005842 -0.000710 -0.027080 0.105607 0.007522 0.131185 -0.021272 -0.036363 -0.094480 -0.067601 -0.025444 0.018251 0.061081 -0.004296 0.026799 0.062157 -0.003793 -0.022174 -0.028921 -0.025606
GrLivArea 0.402797 0.263116 0.199010 0.287389 0.390857 0.208171 -0.009640 0.240257 0.454868 0.566024 0.687501 0.134683 1.000000 0.034836 -0.018918 0.630012 0.415772 0.521270 0.100063 0.825489 0.461679 0.231197 0.467247 0.468997 0.247433 0.330224 0.009113 0.020643 0.101510 0.170205 -0.002416 0.050240 -0.036526 0.708624
BsmtFullBath 0.100949 0.158155 0.187599 0.119470 0.085310 0.649212 0.158678 -0.422900 0.307351 0.244671 -0.169494 -0.047143 0.034836 1.000000 -0.147871 -0.064512 -0.030905 -0.150673 -0.041503 -0.053275 0.137928 0.124553 0.131881 0.179189 0.175315 0.067341 -0.049911 -0.000106 0.023148 0.067616 -0.023047 -0.025361 0.067049 0.227122
BsmtHalfBath -0.007234 0.048046 -0.038162 -0.012337 0.026673 0.067418 0.070948 -0.095804 -0.000315 0.001956 -0.023855 -0.005842 -0.018918 -0.147871 1.000000 -0.054536 -0.012340 0.046519 -0.037944 -0.023836 0.028976 -0.077464 -0.020891 -0.024536 0.040161 -0.025324 -0.008555 0.035114 0.032121 0.020025 -0.007367 0.032873 -0.046524 -0.016844
FullBath 0.198769 0.126031 0.468271 0.439046 0.276833 0.058543 -0.076444 0.288886 0.323722 0.380637 0.421378 -0.000710 0.630012 -0.064512 -0.054536 1.000000 0.136381 0.363252 0.133115 0.554784 0.243671 0.484557 0.469672 0.405656 0.187703 0.259977 -0.115093 0.035353 -0.008106 0.049604 -0.014290 0.055872 -0.019669 0.560664
HalfBath 0.053532 0.014259 0.242656 0.183331 0.201444 0.004262 -0.032148 -0.041118 -0.048804 -0.119916 0.609707 -0.027080 0.415772 -0.030905 -0.012340 0.136381 1.000000 0.226651 -0.068263 0.343415 0.203649 0.196785 0.219178 0.163549 0.108080 0.199740 -0.095317 -0.004972 0.072426 0.022381 0.001290 -0.009050 -0.010269 0.284108
BedroomAbvGr 0.263170 0.119690 -0.070651 -0.040581 0.102821 -0.107355 -0.015728 0.166643 0.050450 0.127401 0.502901 0.105607 0.521270 -0.150673 0.046519 0.363252 0.226651 1.000000 0.198597 0.676620 0.107570 -0.064518 0.086106 0.065253 0.046854 0.093810 0.041570 -0.024478 0.044300 0.070703 0.007767 0.046544 -0.036014 0.168213
KitchenAbvGr -0.006069 -0.017784 -0.174800 -0.149598 -0.037610 -0.081007 -0.040751 0.030086 -0.068901 0.068101 0.059306 0.007522 0.100063 -0.041503 -0.037944 0.133115 -0.068263 0.198597 1.000000 0.256045 -0.123936 -0.124411 -0.050634 -0.064433 -0.090130 -0.070091 0.037312 -0.024600 -0.051613 -0.014525 0.062341 0.026589 0.031687 -0.135907
TotRmsAbvGrd 0.352096 0.190015 0.095589 0.191740 0.280682 0.044316 -0.035227 0.250647 0.285573 0.409516 0.616423 0.131185 0.825489 -0.053275 -0.023836 0.554784 0.343415 0.676620 0.256045 1.000000 0.326114 0.148112 0.362289 0.337822 0.165984 0.234192 0.004151 -0.006683 0.059383 0.083757 0.024763 0.036907 -0.034516 0.533723
Fireplaces 0.266639 0.271364 0.147716 0.112581 0.249070 0.260011 0.046921 0.051575 0.339519 0.410531 0.194561 -0.021272 0.461679 0.137928 0.028976 0.243671 0.203649 0.107570 -0.123936 0.326114 1.000000 0.046822 0.300789 0.269141 0.200019 0.169405 -0.024822 0.011257 0.184530 0.095074 0.001409 0.046357 -0.024096 0.466929
GarageYrBlt 0.070250 -0.024947 0.825667 0.642277 0.252691 0.153484 -0.088011 0.190708 0.322445 0.233449 0.070832 -0.036363 0.231197 0.124553 -0.077464 0.484557 0.196785 -0.064518 -0.124411 0.148112 0.046822 1.000000 0.588920 0.564567 0.224577 0.228425 -0.297003 0.023544 -0.075418 -0.014501 -0.032417 0.005337 -0.001014 0.486362
GarageCars 0.285691 0.154871 0.537850 0.420622 0.364204 0.224054 -0.038264 0.214175 0.434585 0.439317 0.183926 -0.094480 0.467247 0.131881 -0.020891 0.469672 0.219178 0.086106 -0.050634 0.362289 0.300789 0.588920 1.000000 0.882475 0.226342 0.213569 -0.151434 0.035765 0.050494 0.020934 -0.043080 0.040522 -0.039117 0.640409
GarageArea 0.344997 0.180403 0.478954 0.371600 0.373066 0.296970 -0.018227 0.183303 0.486665 0.489782 0.138347 -0.067601 0.468997 0.179189 -0.024536 0.405656 0.163549 0.065253 -0.064433 0.337822 0.269141 0.564567 0.882475 1.000000 0.224666 0.241435 -0.121777 0.035087 0.051412 0.061047 -0.027400 0.027974 -0.027378 0.623431
WoodDeckSF 0.088521 0.171698 0.224880 0.205726 0.159718 0.204306 0.067898 -0.005316 0.232019 0.235459 0.092165 -0.025444 0.247433 0.175315 0.040161 0.187703 0.108080 0.046854 -0.090130 0.165984 0.200019 0.224577 0.226342 0.224666 1.000000 0.058661 -0.125989 -0.032771 -0.074181 0.073378 -0.009551 0.021011 0.022270 0.324413
OpenPorchSF 0.151972 0.084774 0.188686 0.226298 0.125703 0.111761 0.003093 0.129005 0.247264 0.211671 0.208026 0.018251 0.330224 0.067341 -0.025324 0.259977 0.199740 0.093810 -0.070091 0.234192 0.169405 0.228425 0.213569 0.241435 0.058661 1.000000 -0.093079 -0.005842 0.074304 0.060762 -0.018584 0.071255 -0.057619 0.315856
EnclosedPorch 0.010700 -0.018340 -0.387268 -0.193919 -0.110204 -0.102303 0.036543 -0.002538 -0.095478 -0.065292 0.061989 0.061081 0.009113 -0.049911 -0.008555 -0.115093 -0.095317 0.041570 0.037312 0.004151 -0.024822 -0.297003 -0.151434 -0.121777 -0.125989 -0.093079 1.000000 -0.037305 -0.082864 0.054203 0.018361 -0.028887 -0.009916 -0.128578
3SsnPorch 0.070029 0.020423 0.031355 0.045286 0.018796 0.026451 -0.029993 0.020764 0.037384 0.056104 -0.024358 -0.004296 0.020643 -0.000106 0.035114 0.035353 -0.004972 -0.024478 -0.024600 -0.006683 0.011257 0.023544 0.035765 0.035087 -0.032771 -0.005842 -0.037305 1.000000 -0.031436 -0.007992 0.000354 0.029474 0.018645 0.044584
ScreenPorch 0.041383 0.043160 -0.050364 -0.038740 0.061466 0.062021 0.088871 -0.012579 0.084489 0.088758 0.040606 0.026799 0.101510 0.023148 0.032121 -0.008106 0.072426 0.044300 -0.051613 0.059383 0.184530 -0.075418 0.050494 0.051412 -0.074181 0.074304 -0.082864 -0.031436 1.000000 0.051307 0.031946 0.023217 0.010694 0.111447
PoolArea 0.206167 0.077672 0.004950 0.005829 0.011723 0.140491 0.041709 -0.035092 0.126053 0.131525 0.081487 0.062157 0.170205 0.067616 0.020025 0.049604 0.022381 0.070703 -0.014525 0.083757 0.095074 -0.014501 0.020934 0.061047 0.073378 0.060762 0.054203 -0.007992 0.051307 1.000000 0.029669 -0.033737 -0.059689 0.092404
MiscVal 0.003368 0.038068 -0.034383 -0.010286 -0.029815 0.003571 0.004940 -0.023837 -0.018479 -0.021096 0.016197 -0.003793 -0.002416 -0.023047 -0.007367 -0.014290 0.001290 0.007767 0.062341 0.024763 0.001409 -0.032417 -0.043080 -0.027400 -0.009551 -0.018584 0.018361 0.000354 0.031946 0.029669 1.000000 -0.006495 0.004906 -0.021190
MoSold 0.011200 0.001205 0.012398 0.021490 -0.005965 -0.015727 -0.015211 0.034888 0.013196 0.031372 0.035164 -0.022174 0.050240 -0.025361 0.032873 0.055872 -0.009050 0.046544 0.026589 0.036907 0.046357 0.005337 0.040522 0.027974 0.021011 0.071255 -0.028887 0.029474 0.023217 -0.033737 -0.006495 1.000000 -0.145721 0.046432
YrSold 0.007450 -0.014261 -0.013618 0.035743 -0.008201 0.014359 0.031706 -0.041258 -0.014969 -0.013604 -0.028700 -0.028921 -0.036526 0.067049 -0.046524 -0.019669 -0.010269 -0.036014 0.031687 -0.034516 -0.024096 -0.001014 -0.039117 -0.027378 0.022270 -0.057619 -0.009916 0.018645 0.010694 -0.059689 0.004906 -0.145721 1.000000 -0.028923
SalePrice 0.351799 0.263843 0.522897 0.507101 0.477493 0.386420 -0.011378 0.214479 0.613581 0.605852 0.319334 -0.025606 0.708624 0.227122 -0.016844 0.560664 0.284108 0.168213 -0.135907 0.533723 0.466929 0.486362 0.640409 0.623431 0.324413 0.315856 -0.128578 0.044584 0.111447 0.092404 -0.021190 0.046432 -0.028923 1.000000
In [794]:
## Check corrlation between numeric columns of test data.
test_data[test_num_columns.columns].corr()
Out[794]:
LotFrontage LotArea YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr TotRmsAbvGrd Fireplaces GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold
LotFrontage 1.000000 0.644608 0.122356 0.092603 0.251533 0.204621 0.046824 0.092031 0.315802 0.461239 -0.036185 -0.037294 0.357125 0.127314 -0.042779 0.163078 0.023850 0.205100 0.016072 0.344366 0.257037 0.082069 0.336373 0.375581 0.157426 0.179795 0.013340 -0.037487 0.113444 0.134232 0.068161 0.008810 -0.025263
LotArea 0.644608 1.000000 0.048314 0.036907 0.188691 0.185470 0.054199 0.071681 0.283049 0.456417 -0.007862 -0.012457 0.366324 0.094052 -0.008378 0.147871 0.079581 0.181171 -0.031830 0.289576 0.282210 0.018330 0.263398 0.315841 0.158483 0.164815 0.099850 -0.001846 0.088712 0.140494 0.139071 0.005152 -0.051144
YearBuilt 0.122356 0.048314 1.000000 0.631696 0.312404 0.309595 -0.008174 0.111892 0.425447 0.338733 0.025195 -0.101154 0.290412 0.234922 -0.022947 0.474028 0.296700 -0.035923 -0.098644 0.134839 0.193597 0.844150 0.538428 0.482497 0.233889 0.208040 -0.363012 -0.005442 -0.031984 -0.001060 0.007325 0.015599 -0.011006
YearRemodAdd 0.092603 0.036907 0.631696 1.000000 0.213937 0.175219 -0.056320 0.148773 0.304515 0.243793 0.177177 -0.059973 0.347946 0.150371 -0.076928 0.477064 0.238807 -0.004413 -0.135940 0.203619 0.153965 0.661765 0.431442 0.382034 0.230724 0.258049 -0.243582 0.025823 -0.053761 -0.034862 0.003011 0.011771 0.029715
MasVnrArea 0.251533 0.188691 0.312404 0.213937 1.000000 0.343267 0.037546 0.064672 0.430966 0.446875 0.063659 -0.045886 0.416648 0.198270 0.003992 0.242522 0.182094 0.053259 -0.066331 0.275533 0.301575 0.257439 0.358488 0.375182 0.172721 0.163666 -0.112814 0.005772 0.069339 -0.005395 0.105723 0.005118 -0.029556
BsmtFinSF1 0.204621 0.185470 0.309595 0.175219 0.343267 1.000000 -0.059522 -0.459581 0.550444 0.470077 -0.188952 -0.068178 0.215692 0.628903 0.088971 0.104464 -0.018966 -0.119743 -0.092190 0.060346 0.326075 0.232798 0.285959 0.323800 0.242369 0.136321 -0.097441 0.088241 0.131414 0.012089 0.165403 0.013397 0.030779
BsmtFinSF2 0.046824 0.054199 -0.008174 -0.056320 0.037546 -0.059522 1.000000 -0.265183 0.076092 0.073346 -0.095935 -0.023976 -0.025137 0.166500 0.123715 -0.074858 -0.032627 -0.044951 -0.034790 -0.060361 0.083655 -0.051545 0.005806 0.022391 0.126032 -0.014185 0.029010 -0.014473 0.039806 0.050152 -0.012808 -0.003162 -0.011749
BsmtUnfSF 0.092031 0.071681 0.111892 0.148773 0.064672 -0.459581 -0.265183 1.000000 0.409023 0.275569 -0.006196 0.067212 0.226780 -0.374659 -0.117686 0.257721 -0.030584 0.199661 0.102069 0.243566 -0.043014 0.153246 0.147005 0.145625 -0.073174 0.111249 0.012468 -0.046230 -0.085111 -0.029672 0.000320 0.009132 -0.035214
TotalBsmtSF 0.315802 0.283049 0.425447 0.304515 0.430966 0.550444 0.076092 0.409023 1.000000 0.784538 -0.238633 -0.013294 0.435576 0.343680 0.024722 0.331947 -0.062711 0.056093 -0.007879 0.278408 0.326100 0.372405 0.441401 0.485558 0.227192 0.244300 -0.076275 0.039289 0.066942 0.003147 0.165227 0.021525 -0.007817
1stFlrSF 0.461239 0.456417 0.338733 0.243793 0.446875 0.470077 0.073346 0.275569 0.784538 1.000000 -0.298222 -0.011519 0.560631 0.278594 0.019899 0.365945 -0.088929 0.090188 0.084255 0.374373 0.404611 0.284615 0.441707 0.494192 0.219573 0.263813 -0.066071 0.028680 0.107902 0.112558 0.181387 0.048064 -0.013566
2ndFlrSF -0.036185 -0.007862 0.025195 0.177177 0.063659 -0.188952 -0.095935 -0.006196 -0.238633 -0.298222 1.000000 -0.035688 0.618446 -0.153136 -0.095531 0.384515 0.613430 0.504458 0.079276 0.548464 0.143637 0.100355 0.181314 0.118704 0.087555 0.163780 0.048846 -0.047828 -0.018239 -0.006163 -0.022370 -0.009415 -0.010098
LowQualFinSF -0.037294 -0.012457 -0.101154 -0.059973 -0.045886 -0.068178 -0.023976 0.067212 -0.013294 -0.011519 -0.035688 1.000000 0.050346 -0.046815 -0.020807 -0.004986 -0.053635 0.031993 -0.008344 0.065386 0.008156 -0.065324 -0.038844 -0.038496 -0.005262 -0.020197 0.115254 -0.007149 -0.013932 -0.004606 -0.007424 0.046473 0.026864
GrLivArea 0.357125 0.366324 0.290412 0.347946 0.416648 0.215692 -0.025137 0.226780 0.435576 0.560631 0.618446 0.050346 1.000000 0.088789 -0.069128 0.632701 0.453582 0.513831 0.137003 0.788012 0.456944 0.316098 0.515693 0.504555 0.255416 0.356366 -0.001413 -0.018561 0.071417 0.086542 0.128687 0.035472 -0.017434
BsmtFullBath 0.127314 0.094052 0.234922 0.150371 0.198270 0.628903 0.166500 -0.374659 0.343680 0.278594 -0.153136 -0.046815 0.088789 1.000000 -0.150071 0.025620 -0.035895 -0.159455 0.006600 -0.023118 0.201068 0.174664 0.189895 0.190121 0.196560 0.094315 -0.085258 0.068396 0.081740 0.014860 0.009305 0.018309 0.023824
BsmtHalfBath -0.042779 -0.008378 -0.022947 -0.076928 0.003992 0.088971 0.123715 -0.117686 0.024722 0.019899 -0.095531 -0.020807 -0.069128 -0.150071 1.000000 -0.040176 -0.102009 -0.006723 -0.091839 -0.074981 0.049807 -0.041000 -0.044934 -0.018561 0.062275 -0.044065 -0.011173 0.018202 0.050842 0.127856 0.069801 0.015006 0.006073
FullBath 0.163078 0.147871 0.474028 0.477064 0.242522 0.104464 -0.074858 0.257721 0.331947 0.365945 0.384515 -0.004986 0.632701 0.025620 -0.040176 1.000000 0.180297 0.349285 0.210972 0.500354 0.228681 0.506458 0.489982 0.411278 0.175053 0.260829 -0.122930 -0.012760 -0.023736 0.000768 -0.006936 0.037308 0.010283
HalfBath 0.023850 0.079581 0.296700 0.238807 0.182094 -0.018966 -0.032627 -0.030584 -0.062711 -0.088929 0.613430 -0.053635 0.453582 -0.035895 -0.102009 0.180297 1.000000 0.263638 -0.015793 0.348590 0.207970 0.251409 0.249431 0.194229 0.125136 0.165246 -0.069873 -0.051598 -0.000446 -0.026345 0.046894 0.006309 0.013504
BedroomAbvGr 0.205100 0.181171 -0.035923 -0.004413 0.053259 -0.119743 -0.044951 0.199661 0.056093 0.090188 0.504458 0.031993 0.513831 -0.159455 -0.006723 0.349285 0.263638 1.000000 0.285674 0.664498 0.066133 -0.028440 0.099297 0.082304 0.016902 0.079231 0.057772 -0.085070 -0.028374 -0.007087 -0.005398 0.064727 -0.005113
KitchenAbvGr 0.016072 -0.031830 -0.098644 -0.135940 -0.066331 -0.092190 -0.034790 0.102069 -0.007879 0.084255 0.079276 -0.008344 0.137003 0.006600 -0.091839 0.210972 -0.015793 0.285674 1.000000 0.338219 -0.091652 -0.062182 -0.023325 -0.051080 -0.084779 -0.066172 0.018837 -0.018113 -0.061488 -0.011669 -0.005186 0.044159 0.038614
TotRmsAbvGrd 0.344366 0.289576 0.134839 0.203619 0.275533 0.060346 -0.060361 0.243566 0.278408 0.374373 0.548464 0.065386 0.788012 -0.023118 -0.074981 0.500354 0.348590 0.664498 0.338219 1.000000 0.294427 0.176934 0.355386 0.320217 0.146832 0.244571 0.027953 -0.059911 0.005290 0.055019 0.094063 0.050666 -0.031627
Fireplaces 0.257037 0.282210 0.193597 0.153965 0.301575 0.326075 0.083655 -0.043014 0.326100 0.404611 0.143637 0.008156 0.456944 0.201068 0.049807 0.228681 0.207970 0.066133 -0.091652 0.294427 1.000000 0.128770 0.341988 0.320092 0.254528 0.149040 0.025176 0.028696 0.156343 0.105926 0.014802 0.016598 0.010002
GarageYrBlt 0.082069 0.018330 0.844150 0.661765 0.257439 0.232798 -0.051545 0.153246 0.372405 0.284615 0.100355 -0.065324 0.316098 0.174664 -0.041000 0.506458 0.251409 -0.028440 -0.062182 0.176934 0.128770 1.000000 0.586649 0.548113 0.220850 0.235077 -0.303646 0.016753 -0.049821 -0.015421 0.007926 0.040189 -0.008451
GarageCars 0.336373 0.263398 0.538428 0.431442 0.358488 0.285959 0.005806 0.147005 0.441401 0.441707 0.181314 -0.038844 0.515693 0.189895 -0.044934 0.489982 0.249431 0.099297 -0.023325 0.355386 0.341988 0.586649 1.000000 0.896674 0.254332 0.194292 -0.116620 0.007189 0.036144 0.043302 0.002754 0.060845 -0.007032
GarageArea 0.375581 0.315841 0.482497 0.382034 0.375182 0.323800 0.022391 0.145625 0.485558 0.494192 0.118704 -0.038496 0.504555 0.190121 -0.018561 0.411278 0.194229 0.082304 -0.051080 0.320217 0.320092 0.548113 0.896674 1.000000 0.251051 0.224219 -0.092776 0.022661 0.073089 0.043922 0.036352 0.052470 0.000536
WoodDeckSF 0.157426 0.158483 0.233889 0.230724 0.172721 0.242369 0.126032 -0.073174 0.227192 0.219573 0.087555 -0.005262 0.255416 0.196560 0.062275 0.175053 0.125136 0.016902 -0.084779 0.146832 0.254528 0.220850 0.254332 0.251051 1.000000 0.019488 -0.113036 0.036622 -0.030682 0.123409 0.108898 0.014995 -0.022818
OpenPorchSF 0.179795 0.164815 0.208040 0.258049 0.163666 0.136321 -0.014185 0.111249 0.244300 0.263813 0.163780 -0.020197 0.356366 0.094315 -0.044065 0.260829 0.165246 0.079231 -0.066172 0.244571 0.149040 0.235077 0.194292 0.224219 0.019488 1.000000 -0.030918 -0.013865 0.022233 0.070795 0.150404 -0.000255 -0.017122
EnclosedPorch 0.013340 0.099850 -0.363012 -0.243582 -0.112814 -0.097441 0.029010 0.012468 -0.076275 -0.066071 0.048846 0.115254 -0.001413 -0.085258 -0.011173 -0.122930 -0.069873 0.057772 0.018837 0.027953 0.025176 -0.303646 -0.116620 -0.092776 -0.113036 -0.030918 1.000000 -0.027645 -0.048550 0.142589 0.001353 -0.012543 0.007616
3SsnPorch -0.037487 -0.001846 -0.005442 0.025823 0.005772 0.088241 -0.014473 -0.046230 0.039289 0.028680 -0.047828 -0.007149 -0.018561 0.068396 0.018202 -0.012760 -0.051598 -0.085070 -0.018113 -0.059911 0.028696 0.016753 0.007189 0.022661 0.036622 -0.013865 -0.027645 1.000000 -0.026785 -0.005083 -0.001242 0.022444 0.027818
ScreenPorch 0.113444 0.088712 -0.031984 -0.053761 0.069339 0.131414 0.039806 -0.085111 0.066942 0.107902 -0.018239 -0.013932 0.071417 0.081740 0.050842 -0.023736 -0.000446 -0.028374 -0.061488 0.005290 0.156343 -0.049821 0.036144 0.073089 -0.030682 0.022233 -0.048550 -0.026785 1.000000 -0.004897 -0.012549 0.035212 -0.023439
PoolArea 0.134232 0.140494 -0.001060 -0.034862 -0.005395 0.012089 0.050152 -0.029672 0.003147 0.112558 -0.006163 -0.004606 0.086542 0.014860 0.127856 0.000768 -0.026345 -0.007087 -0.011669 0.055019 0.105926 -0.015421 0.043302 0.043922 0.123409 0.070795 0.142589 -0.005083 -0.004897 1.000000 -0.005279 -0.055731 -0.045185
MiscVal 0.068161 0.139071 0.007325 0.003011 0.105723 0.165403 -0.012808 0.000320 0.165227 0.181387 -0.022370 -0.007424 0.128687 0.009305 0.069801 -0.006936 0.046894 -0.005398 -0.005186 0.094063 0.014802 0.007926 0.002754 0.036352 0.108898 0.150404 0.001353 -0.001242 -0.012549 -0.005279 1.000000 0.019369 0.011829
MoSold 0.008810 0.005152 0.015599 0.011771 0.005118 0.013397 -0.003162 0.009132 0.021525 0.048064 -0.009415 0.046473 0.035472 0.018309 0.015006 0.037308 0.006309 0.064727 0.044159 0.050666 0.016598 0.040189 0.060845 0.052470 0.014995 -0.000255 -0.012543 0.022444 0.035212 -0.055731 0.019369 1.000000 -0.163924
YrSold -0.025263 -0.051144 -0.011006 0.029715 -0.029556 0.030779 -0.011749 -0.035214 -0.007817 -0.013566 -0.010098 0.026864 -0.017434 0.023824 0.006073 0.010283 0.013504 -0.005113 0.038614 -0.031627 0.010002 -0.008451 -0.007032 0.000536 -0.022818 -0.017122 0.007616 0.027818 -0.023439 -0.045185 0.011829 -0.163924 1.000000
In [795]:
## Calculate variance for numeric columns.
def variance(x):
        return(pd.DataFrame({'Datatype' : x.dtypes,
                            'Variance': [round(x[i].var()) for i in x]
                            }))

## Get variance for numeric columns of train data.
variance(num_columns)
Out[795]:
Datatype Variance
LotFrontage float64 590
LotArea int64 99625650
YearBuilt int64 912
YearRemodAdd int64 426
MasVnrArea float64 32785
BsmtFinSF1 int64 208025
BsmtFinSF2 int64 26024
BsmtUnfSF int64 195246
TotalBsmtSF int64 192462
1stFlrSF int64 149450
2ndFlrSF int64 190557
LowQualFinSF int64 2364
GrLivArea int64 276130
BsmtFullBath int64 0
BsmtHalfBath int64 0
FullBath int64 0
HalfBath int64 0
BedroomAbvGr int64 1
KitchenAbvGr int64 0
TotRmsAbvGrd int64 3
Fireplaces int64 0
GarageYrBlt float64 610
GarageCars int64 1
GarageArea int64 45713
WoodDeckSF int64 15710
OpenPorchSF int64 4390
EnclosedPorch int64 3736
3SsnPorch int64 860
ScreenPorch int64 3109
PoolArea int64 1614
MiscVal int64 246138
MoSold int64 7
YrSold int64 2
SalePrice int64 6311111264
In [796]:
## Get variance for numeric columns of test data.
variance(test_num_columns)
Out[796]:
Datatype Variance
LotFrontage float64 501
LotArea int64 24557152
YearBuilt int64 924
YearRemodAdd int64 446
MasVnrArea float64 31551
BsmtFinSF1 float64 207269
BsmtFinSF2 float64 31242
BsmtUnfSF float64 191197
TotalBsmtSF float64 196159
1stFlrSF int64 158536
2ndFlrSF int64 176913
LowQualFinSF int64 1940
GrLivArea int64 235774
BsmtFullBath float64 0
BsmtHalfBath float64 0
FullBath int64 0
HalfBath int64 0
BedroomAbvGr int64 1
KitchenAbvGr int64 0
TotRmsAbvGrd int64 2
Fireplaces int64 0
GarageYrBlt float64 699
GarageCars float64 1
GarageArea float64 47110
WoodDeckSF int64 16319
OpenPorchSF int64 4745
EnclosedPorch int64 4520
3SsnPorch int64 408
ScreenPorch int64 3205
PoolArea int64 930
MiscVal int64 397917
MoSold int64 7
YrSold int64 2
In [797]:
## Drop zero variance variable from train data set.
cols = ['BsmtFullBath','BsmtHalfBath','FullBath','HalfBath','KitchenAbvGr','Fireplaces']
data = data.drop(cols,axis=1)
num_columns = num_columns.drop(cols,axis=1)
In [798]:
## Drop zero variance variable from test data set.
test_data = test_data.drop(cols,axis=1)
test_num_columns = test_num_columns.drop(cols,axis=1)
In [799]:
## Get first record of train data after dropping few columns.
data[:1]
Out[799]:
MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BedroomAbvGr KitchenQual TotRmsAbvGrd Functional FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
Id
1 60 RL 65.0 8450 Pave NAA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196.0 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 3 Gd 8 Typ NF Attchd 2003.0 RFn 2 548 TA TA Y 0 61 0 0 0 0 NP NF NE 0 2 2008 WD Normal 208500
In [800]:
## Get first record of test data after dropping few columns.
test_data[:1]
Out[800]:
MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BedroomAbvGr KitchenQual TotRmsAbvGrd Functional FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition
Id
1461 20 RH 80.0 11622 Pave NAA Reg Lvl AllPub Inside Gtl NAmes Feedr Norm 1Fam 1Story 5 6 1961 1961 Gable CompShg VinylSd VinylSd None 0.0 TA TA CBlock TA TA No Rec 468.0 LwQ 144.0 270.0 882.0 GasA TA Y SBrkr 896 0 0 896 2 TA 5 Typ NF Attchd 1961.0 Unf 1.0 730.0 TA TA Y 140 0 0 0 120 0 NP MnPrv NE 0 6 2010 WD Normal
In [801]:
## Seperate Target and Predictors and display dimensions.

features = data.drop('SalePrice', axis = 1)
print(features.shape)
target = data['SalePrice']
print(target.shape)
(1460, 73)
(1460,)
In [802]:
## Display dimesnions of test data.
test_data.shape
Out[802]:
(1459, 73)
In [805]:
## Split data into train and validation.
X_train,X_test,y_train,y_test=train_test_split(features,target,test_size=0.3,random_state=123)
In [807]:
## Seperate category and numeric  columns for train data.
catcols_train = X_train.select_dtypes(include=['object','category'])
numcols_train = X_train.select_dtypes(include=['int64', 'float64'])
In [808]:
## Seperate category and numeric  columns for test data.
test_catcols = test_data.select_dtypes(include=['object','category'])
test_numcols = test_data.select_dtypes(include=['int64', 'float64'])
In [809]:
## Display dimensions and column names of category and numeric columns of train data.
print(catcols_train.shape)
print(catcols_train.columns)
print(numcols_train.shape)
print(numcols_train.columns)
(1022, 46)
Index(['MSSubClass', 'MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour',
       'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1',
       'Condition2', 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond',
       'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
       'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond',
       'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC',
       'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu',
       'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive',
       'PoolQC', 'Fence', 'MiscFeature', 'SaleType', 'SaleCondition'],
      dtype='object')
(1022, 27)
Index(['LotFrontage', 'LotArea', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea',
       'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF',
       '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BedroomAbvGr', 'TotRmsAbvGrd',
       'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF',
       'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal',
       'MoSold', 'YrSold'],
      dtype='object')
In [810]:
## Display dimensions and column names of category and numeric columns of validation data.
print(test_catcols.shape)
print(test_catcols.columns)
print(test_numcols.shape)
print(test_numcols.columns)
(1459, 46)
Index(['MSSubClass', 'MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour',
       'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1',
       'Condition2', 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond',
       'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
       'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond',
       'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC',
       'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu',
       'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive',
       'PoolQC', 'Fence', 'MiscFeature', 'SaleType', 'SaleCondition'],
      dtype='object')
(1459, 27)
Index(['LotFrontage', 'LotArea', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea',
       'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF',
       '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BedroomAbvGr', 'TotRmsAbvGrd',
       'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF',
       'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal',
       'MoSold', 'YrSold'],
      dtype='object')
In [811]:
## Seperate category and numeric columns from test data.
catcols_test = X_test.select_dtypes(include=['object','category'])
numcols_test = X_test.select_dtypes(include=['int64', 'float64'])
In [812]:
## Display dimesniosn of test data.
catcols_test.shape
Out[812]:
(438, 46)
In [813]:
################################################### Imputation ###############################################################
In [814]:
## Import imputer,scaler libraries for imputing null values.
from sklearn.impute import SimpleImputer
In [815]:
## Instantiate numeric ,category imputers.
num_imputer = SimpleImputer(strategy = 'median')
cat_imputer = SimpleImputer(strategy = 'most_frequent')
In [817]:
## Fit numeric imputer.
num_imputer.fit(numcols_train)

## Impute numeric columns NA values of train data and prepare data frame.
X_train_imp = num_imputer.transform(numcols_train)
X_train_imp =pd.DataFrame(X_train_imp,columns=numcols_train.columns)
In [818]:
## Check NA values for numeric columns of train data after imputing.
X_train_imp.isna().sum()
Out[818]:
LotFrontage      0
LotArea          0
YearBuilt        0
YearRemodAdd     0
MasVnrArea       0
BsmtFinSF1       0
BsmtFinSF2       0
BsmtUnfSF        0
TotalBsmtSF      0
1stFlrSF         0
2ndFlrSF         0
LowQualFinSF     0
GrLivArea        0
BedroomAbvGr     0
TotRmsAbvGrd     0
GarageYrBlt      0
GarageCars       0
GarageArea       0
WoodDeckSF       0
OpenPorchSF      0
EnclosedPorch    0
3SsnPorch        0
ScreenPorch      0
PoolArea         0
MiscVal          0
MoSold           0
YrSold           0
dtype: int64
In [819]:
## Check first 5 records of numeric column of train data.
X_train_imp.head()
Out[819]:
LotFrontage LotArea YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BedroomAbvGr TotRmsAbvGrd GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold
0 57.0 8846.0 1996.0 1996.0 0.0 298.0 0.0 572.0 870.0 914.0 0.0 0.0 914.0 2.0 5.0 1998.0 2.0 576.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 7.0 2006.0
1 55.0 5350.0 1940.0 1966.0 0.0 0.0 0.0 728.0 728.0 1306.0 0.0 0.0 1306.0 3.0 6.0 1979.0 0.0 0.0 263.0 0.0 0.0 0.0 0.0 0.0 450.0 5.0 2010.0
2 70.0 8521.0 1967.0 1967.0 0.0 842.0 0.0 70.0 912.0 912.0 0.0 0.0 912.0 3.0 5.0 1974.0 1.0 336.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 2010.0
3 84.0 8658.0 1965.0 1965.0 101.0 643.0 0.0 445.0 1088.0 1324.0 0.0 0.0 1324.0 3.0 6.0 1965.0 2.0 440.0 0.0 138.0 0.0 0.0 0.0 0.0 0.0 12.0 2006.0
4 64.0 6762.0 2007.0 2007.0 108.0 664.0 0.0 544.0 1208.0 1208.0 0.0 0.0 1208.0 2.0 6.0 2007.0 2.0 628.0 105.0 54.0 0.0 0.0 0.0 0.0 0.0 9.0 2007.0
In [820]:
## Fit category imputer.
cat_imputer.fit(catcols_train)

## Impute NA values for category columns of train data and prepares a dataframe.
X_train_imp_cat = cat_imputer.transform(catcols_train)
X_train_imp_cat = pd.DataFrame(X_train_imp_cat,columns=catcols_train.columns)
In [821]:
## Check dimesnions of category columns of train data.
X_train_imp_cat.shape
Out[821]:
(1022, 46)
In [822]:
## Check NA values for category columns of train data after imputation.
X_train_imp_cat.isna().sum()
Out[822]:
MSSubClass       0
MSZoning         0
Street           0
Alley            0
LotShape         0
LandContour      0
Utilities        0
LotConfig        0
LandSlope        0
Neighborhood     0
Condition1       0
Condition2       0
BldgType         0
HouseStyle       0
OverallQual      0
OverallCond      0
RoofStyle        0
RoofMatl         0
Exterior1st      0
Exterior2nd      0
MasVnrType       0
ExterQual        0
ExterCond        0
Foundation       0
BsmtQual         0
BsmtCond         0
BsmtExposure     0
BsmtFinType1     0
BsmtFinType2     0
Heating          0
HeatingQC        0
CentralAir       0
Electrical       0
KitchenQual      0
Functional       0
FireplaceQu      0
GarageType       0
GarageFinish     0
GarageQual       0
GarageCond       0
PavedDrive       0
PoolQC           0
Fence            0
MiscFeature      0
SaleType         0
SaleCondition    0
dtype: int64
In [823]:
## Check first 5 records of category columns of train data.
X_train_imp_cat.head()
Out[823]:
MSSubClass MSZoning Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinType2 Heating HeatingQC CentralAir Electrical KitchenQual Functional FireplaceQu GarageType GarageFinish GarageQual GarageCond PavedDrive PoolQC Fence MiscFeature SaleType SaleCondition
0 85 RL Pave NAA IR1 Lvl AllPub CulDSac Gtl CollgCr Norm Norm 1Fam SFoyer 5 5 Gable CompShg VinylSd VinylSd None Gd TA PConc Gd TA Av GLQ Unf GasA Ex Y SBrkr TA Typ NF Detchd Unf TA TA Y NP NF NE WD Normal
1 30 RL Pave NAA IR1 Lvl AllPub Inside Gtl BrkSide Norm Norm 1Fam 1Story 3 2 Gable CompShg Wd Sdng Plywood None TA Po CBlock TA TA No Unf Unf GasA Ex Y SBrkr Fa Mod NF NG NG NG NG Y NP GdWo Shed WD Normal
2 20 RL Pave NAA Reg Lvl AllPub FR2 Gtl Sawyer Feedr Norm 1Fam 1Story 5 5 Gable CompShg HdBoard HdBoard None TA TA CBlock TA TA No ALQ Unf GasA TA Y SBrkr TA Typ Fa Detchd Unf TA TA Y NP MnPrv NE WD Normal
3 20 RL Pave NAA Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 6 5 Gable CompShg Wd Sdng Wd Sdng BrkFace TA TA CBlock TA TA No Rec Unf GasA Ex Y SBrkr TA Typ TA Attchd RFn TA TA Y NP GdWo NE WD Abnorml
4 20 RL Pave NAA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 7 5 Gable CompShg VinylSd VinylSd BrkFace Gd TA PConc Gd TA No GLQ Unf GasA Ex Y SBrkr Gd Typ NF Attchd RFn TA TA Y NP NF NE New Partial
In [825]:
## Impute NA values for numeric columns of validation data and prepares a dataframe,display first 5 records.
X_test_imp = num_imputer.transform(numcols_test)
X_test_imp =pd.DataFrame(X_test_imp,columns=numcols_test.columns)
X_test_imp.head()
Out[825]:
LotFrontage LotArea YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BedroomAbvGr TotRmsAbvGrd GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold
0 68.0 9505.0 2001.0 2001.0 180.0 0.0 0.0 884.0 884.0 884.0 1151.0 0.0 2035.0 3.0 8.0 2001.0 2.0 434.0 144.0 48.0 0.0 0.0 0.0 0.0 0.0 5.0 2010.0
1 60.0 9600.0 1900.0 1950.0 0.0 0.0 0.0 1095.0 1095.0 1095.0 679.0 0.0 1774.0 4.0 8.0 1920.0 3.0 779.0 0.0 0.0 90.0 0.0 0.0 0.0 0.0 5.0 2006.0
2 32.0 3363.0 2004.0 2004.0 117.0 0.0 0.0 976.0 976.0 976.0 732.0 0.0 1708.0 3.0 7.0 2004.0 2.0 380.0 0.0 40.0 0.0 0.0 0.0 0.0 0.0 4.0 2006.0
3 75.0 9750.0 1998.0 1998.0 0.0 975.0 0.0 133.0 1108.0 1108.0 989.0 0.0 2097.0 3.0 8.0 1998.0 2.0 583.0 253.0 170.0 0.0 0.0 0.0 0.0 0.0 6.0 2006.0
4 60.0 10930.0 1945.0 1950.0 0.0 580.0 0.0 333.0 913.0 1048.0 510.0 0.0 1558.0 3.0 6.0 1962.0 1.0 288.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.0 2008.0
In [826]:
## Check NA values for numeric columns of validation data after imputing.
X_test_imp.isna().sum()
Out[826]:
LotFrontage      0
LotArea          0
YearBuilt        0
YearRemodAdd     0
MasVnrArea       0
BsmtFinSF1       0
BsmtFinSF2       0
BsmtUnfSF        0
TotalBsmtSF      0
1stFlrSF         0
2ndFlrSF         0
LowQualFinSF     0
GrLivArea        0
BedroomAbvGr     0
TotRmsAbvGrd     0
GarageYrBlt      0
GarageCars       0
GarageArea       0
WoodDeckSF       0
OpenPorchSF      0
EnclosedPorch    0
3SsnPorch        0
ScreenPorch      0
PoolArea         0
MiscVal          0
MoSold           0
YrSold           0
dtype: int64
In [827]:
## Impute NA values for category columns of validation data and prepares a dataframe and display first 5 records.
X_test_imp_cat = cat_imputer.transform(catcols_test)
X_test_imp_cat = pd.DataFrame(X_test_imp_cat,columns=catcols_test.columns)
X_test_imp_cat.head()
Out[827]:
MSSubClass MSZoning Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinType2 Heating HeatingQC CentralAir Electrical KitchenQual Functional FireplaceQu GarageType GarageFinish GarageQual GarageCond PavedDrive PoolQC Fence MiscFeature SaleType SaleCondition
0 60 RL Pave NAA IR1 Lvl AllPub CulDSac Gtl Gilbert Norm Norm 1Fam 2Story 7 5 Gable CompShg VinylSd VinylSd BrkFace Gd TA PConc Gd TA No Unf Unf GasA Ex Y SBrkr Gd Typ Gd BuiltIn Fin TA TA Y NP NF NE WD Normal
1 70 RM Pave Grvl Reg Lvl AllPub Inside Gtl OldTown Norm Norm 1Fam 2Story 4 2 Gable CompShg AsbShng Stucco None TA TA BrkTil TA Fa No Unf Unf GasW Fa N SBrkr TA Min2 NF 2Types Unf Fa Fa N NP NF NE WD Normal
2 160 RM Pave NAA Reg Lvl AllPub Inside Gtl Edwards Norm Norm TwnhsE 2Story 7 5 Gable CompShg VinylSd VinylSd Stone Gd TA PConc Gd TA No Unf Unf GasA Ex Y SBrkr Gd Maj1 NF Detchd Unf TA TA Y NP NF NE WD Normal
3 60 RL Pave NAA Reg Lvl AllPub Corner Gtl CollgCr Norm Norm 1Fam 2Story 7 6 Gable CompShg VinylSd VinylSd None TA TA PConc Gd TA Av GLQ Unf GasA Ex Y SBrkr Gd Typ TA Detchd RFn TA TA Y NP NF NE WD Normal
4 50 RL Pave Grvl Reg Bnk AllPub Inside Gtl NAmes Artery Norm 1Fam 1.5Fin 5 6 Gable CompShg MetalSd MetalSd None TA TA CBlock TA TA No BLQ Unf GasA TA Y FuseA TA Typ TA Attchd Unf TA TA Y NP NF NE WD Normal
In [828]:
## Display dimensions of category columns of validation data.
X_test_imp_cat.shape
Out[828]:
(438, 46)
In [829]:
## Check NA values for category columns of validation data after imputation.
X_test_imp_cat.isna().sum()
Out[829]:
MSSubClass       0
MSZoning         0
Street           0
Alley            0
LotShape         0
LandContour      0
Utilities        0
LotConfig        0
LandSlope        0
Neighborhood     0
Condition1       0
Condition2       0
BldgType         0
HouseStyle       0
OverallQual      0
OverallCond      0
RoofStyle        0
RoofMatl         0
Exterior1st      0
Exterior2nd      0
MasVnrType       0
ExterQual        0
ExterCond        0
Foundation       0
BsmtQual         0
BsmtCond         0
BsmtExposure     0
BsmtFinType1     0
BsmtFinType2     0
Heating          0
HeatingQC        0
CentralAir       0
Electrical       0
KitchenQual      0
Functional       0
FireplaceQu      0
GarageType       0
GarageFinish     0
GarageQual       0
GarageCond       0
PavedDrive       0
PoolQC           0
Fence            0
MiscFeature      0
SaleType         0
SaleCondition    0
dtype: int64
In [831]:
## Impute numeric columns NA values of test data and prepare data frame,display first 5 records.
test_imp = num_imputer.transform(test_numcols)
test_imp =pd.DataFrame(test_imp,columns=test_numcols.columns)
test_imp.head()
Out[831]:
LotFrontage LotArea YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BedroomAbvGr TotRmsAbvGrd GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold
0 80.0 11622.0 1961.0 1961.0 0.0 468.0 144.0 270.0 882.0 896.0 0.0 0.0 896.0 2.0 5.0 1961.0 1.0 730.0 140.0 0.0 0.0 0.0 120.0 0.0 0.0 6.0 2010.0
1 81.0 14267.0 1958.0 1958.0 108.0 923.0 0.0 406.0 1329.0 1329.0 0.0 0.0 1329.0 3.0 6.0 1958.0 1.0 312.0 393.0 36.0 0.0 0.0 0.0 0.0 12500.0 6.0 2010.0
2 74.0 13830.0 1997.0 1998.0 0.0 791.0 0.0 137.0 928.0 928.0 701.0 0.0 1629.0 3.0 6.0 1997.0 2.0 482.0 212.0 34.0 0.0 0.0 0.0 0.0 0.0 3.0 2010.0
3 78.0 9978.0 1998.0 1998.0 20.0 602.0 0.0 324.0 926.0 926.0 678.0 0.0 1604.0 3.0 7.0 1998.0 2.0 470.0 360.0 36.0 0.0 0.0 0.0 0.0 0.0 6.0 2010.0
4 43.0 5005.0 1992.0 1992.0 0.0 263.0 0.0 1017.0 1280.0 1280.0 0.0 0.0 1280.0 2.0 5.0 1992.0 2.0 506.0 0.0 82.0 0.0 0.0 144.0 0.0 0.0 1.0 2010.0
In [832]:
## Check NA values for numeric columns of test data after imputing.
test_imp.isna().sum()
Out[832]:
LotFrontage      0
LotArea          0
YearBuilt        0
YearRemodAdd     0
MasVnrArea       0
BsmtFinSF1       0
BsmtFinSF2       0
BsmtUnfSF        0
TotalBsmtSF      0
1stFlrSF         0
2ndFlrSF         0
LowQualFinSF     0
GrLivArea        0
BedroomAbvGr     0
TotRmsAbvGrd     0
GarageYrBlt      0
GarageCars       0
GarageArea       0
WoodDeckSF       0
OpenPorchSF      0
EnclosedPorch    0
3SsnPorch        0
ScreenPorch      0
PoolArea         0
MiscVal          0
MoSold           0
YrSold           0
dtype: int64
In [833]:
## Impute NA values for category columns of test data and prepares a dataframe,display first 5 records.
test_imp_cat = cat_imputer.transform(test_catcols)
test_imp_cat = pd.DataFrame(test_imp_cat,columns=test_catcols.columns)
test_imp_cat.head()
Out[833]:
MSSubClass MSZoning Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinType2 Heating HeatingQC CentralAir Electrical KitchenQual Functional FireplaceQu GarageType GarageFinish GarageQual GarageCond PavedDrive PoolQC Fence MiscFeature SaleType SaleCondition
0 20 RH Pave NAA Reg Lvl AllPub Inside Gtl NAmes Feedr Norm 1Fam 1Story 5 6 Gable CompShg VinylSd VinylSd None TA TA CBlock TA TA No Rec LwQ GasA TA Y SBrkr TA Typ NF Attchd Unf TA TA Y NP MnPrv NE WD Normal
1 20 RL Pave NAA IR1 Lvl AllPub Corner Gtl NAmes Norm Norm 1Fam 1Story 6 6 Hip CompShg Wd Sdng Wd Sdng BrkFace TA TA CBlock TA TA No ALQ Unf GasA TA Y SBrkr Gd Typ NF Attchd Unf TA TA Y NP NF Gar2 WD Normal
2 60 RL Pave NAA IR1 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story 5 5 Gable CompShg VinylSd VinylSd None TA TA PConc Gd TA No GLQ Unf GasA Gd Y SBrkr TA Typ TA Attchd Fin TA TA Y NP MnPrv NE WD Normal
3 60 RL Pave NAA IR1 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story 6 6 Gable CompShg VinylSd VinylSd BrkFace TA TA PConc TA TA No GLQ Unf GasA Ex Y SBrkr Gd Typ Gd Attchd Fin TA TA Y NP NF NE WD Normal
4 120 RL Pave NAA IR1 HLS AllPub Inside Gtl StoneBr Norm Norm TwnhsE 1Story 8 5 Gable CompShg HdBoard HdBoard None Gd TA PConc Gd TA No ALQ Unf GasA Ex Y SBrkr Gd Typ NF Attchd RFn TA TA Y NP NF NE WD Normal
In [834]:
## Display dimensions of category columns of test data.
test_imp_cat.shape
Out[834]:
(1459, 46)
In [835]:
## Check NA values for category columns of test data after imputation.
test_imp_cat.isna().sum()
Out[835]:
MSSubClass       0
MSZoning         0
Street           0
Alley            0
LotShape         0
LandContour      0
Utilities        0
LotConfig        0
LandSlope        0
Neighborhood     0
Condition1       0
Condition2       0
BldgType         0
HouseStyle       0
OverallQual      0
OverallCond      0
RoofStyle        0
RoofMatl         0
Exterior1st      0
Exterior2nd      0
MasVnrType       0
ExterQual        0
ExterCond        0
Foundation       0
BsmtQual         0
BsmtCond         0
BsmtExposure     0
BsmtFinType1     0
BsmtFinType2     0
Heating          0
HeatingQC        0
CentralAir       0
Electrical       0
KitchenQual      0
Functional       0
FireplaceQu      0
GarageType       0
GarageFinish     0
GarageQual       0
GarageCond       0
PavedDrive       0
PoolQC           0
Fence            0
MiscFeature      0
SaleType         0
SaleCondition    0
dtype: int64
In [836]:
#################################################### Standardization  ##########################################################
In [837]:
## Import Scaler library to scale the numeric values.
from sklearn.preprocessing import StandardScaler
In [838]:
## Instantiate scaler and fit a model.
scaler = StandardScaler()
scaler.fit(X_train_imp)
Out[838]:
StandardScaler(copy=True, with_mean=True, with_std=True)
In [840]:
## Standardize numeric column values of train data,prepare a dataframe and display first 5 records.
X_train_scaler = scaler.transform(X_train_imp)
X_train_scaler = pd.DataFrame(X_train_scaler,columns=X_train_imp.columns)
X_train_scaler.head()
Out[840]:
LotFrontage LotArea YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BedroomAbvGr TotRmsAbvGrd GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold
0 -0.561119 -0.150083 0.842112 0.534621 -0.570667 -0.301414 -0.28229 0.009986 -0.403595 -0.628924 -0.797828 -0.1294 -1.122539 -1.03986 -0.924643 0.824828 0.333696 0.518931 -0.749421 -0.685547 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 0.232125 -1.396198
1 -0.653665 -0.505027 -0.986602 -0.914670 -0.570667 -0.932224 -0.28229 0.361595 -0.714639 0.353300 -0.797828 -0.1294 -0.394941 0.14623 -0.319271 0.046080 -2.330655 -2.190237 1.284957 -0.685547 -0.360803 -0.112837 -0.271032 -0.069193 1.343344 -0.506915 1.618941
2 0.040432 -0.183080 -0.104901 -0.866360 -0.570667 0.850133 -0.28229 -1.121474 -0.311597 -0.633935 -0.797828 -0.1294 -1.126251 0.14623 -0.924643 -0.158854 -0.998480 -0.609889 -0.749421 -0.685547 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.506915 1.618941
3 0.688257 -0.169171 -0.170212 -0.962979 -0.009994 0.428887 -0.28229 -0.276260 0.073922 0.398402 -0.797828 -0.1294 -0.361531 0.14623 -0.319271 -0.527734 0.333696 -0.120733 -0.749421 1.434220 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 2.079725 -1.396198
4 -0.237207 -0.361669 1.201324 1.066028 0.028864 0.473340 -0.28229 -0.053123 0.336776 0.107744 -0.797828 -0.1294 -0.576840 -1.03986 -0.319271 1.193708 0.333696 0.763509 0.062783 0.143927 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 0.971165 -0.642413
In [842]:
## Standardize numeric column values of validation data,prepare a dataframe and display first 5 records.
X_test_scaler = scaler.transform(X_test_imp)
X_test_scaler = pd.DataFrame(X_test_scaler,columns=X_test_imp.columns)
X_test_scaler.head()
Out[842]:
LotFrontage LotArea YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BedroomAbvGr TotRmsAbvGrd GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold
0 -0.052114 -0.083176 1.005390 0.776169 0.428552 -0.932224 -0.28229 0.713204 -0.372929 -0.704094 1.847518 -0.1294 0.958169 0.146230 0.891472 0.947788 0.333696 -0.148954 0.364459 0.051763 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.506915 1.618941
1 -0.422299 -0.073531 -2.292826 -1.687625 -0.570667 -0.932224 -0.28229 1.188778 0.089255 -0.175397 0.762719 -0.1294 0.473722 1.332321 0.891472 -2.372137 1.665871 1.473725 -0.749421 -0.685547 1.088712 -0.112837 -0.271032 -0.069193 -0.108754 -0.506915 -1.396198
2 -1.717949 -0.706764 1.103357 0.921098 0.078825 -0.932224 -0.28229 0.920563 -0.171408 -0.473572 0.884529 -0.1294 0.351219 0.146230 0.286100 1.070748 0.333696 -0.402938 -0.749421 -0.071122 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.876435 -1.396198
3 0.271798 -0.058301 0.907423 0.631240 -0.570667 1.131669 -0.28229 -0.979478 0.117731 -0.142823 1.475193 -0.1294 1.073249 0.146230 0.891472 0.824828 0.333696 0.551855 1.207604 1.925761 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.137395 -1.396198
4 -0.422299 0.061502 -0.823324 -1.687625 -0.570667 0.295527 -0.28229 -0.528697 -0.309406 -0.293164 0.374306 -0.1294 0.072801 0.146230 -0.319271 -0.650694 -0.998480 -0.835653 -0.749421 -0.685547 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.876435 0.111371
In [844]:
## Standardize numeric column values of test data,prepare a dataframe and display first 5 records.
test_scaler = scaler.transform(test_imp)
test_scaler = pd.DataFrame(test_scaler,columns=test_imp.columns)
test_scaler.head()
Out[844]:
LotFrontage LotArea YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BedroomAbvGr TotRmsAbvGrd GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold
0 0.503164 0.131760 -0.300834 -1.156218 -0.570667 0.058444 0.595815 -0.670693 -0.377310 -0.674026 -0.797828 -0.1294 -1.155949 -1.03986 -0.924643 -0.691681 -0.998480 1.243258 0.333518 -0.685547 -0.360803 -0.112837 1.843566 -0.069193 -0.108754 -0.137395 1.618941
1 0.549437 0.400303 -0.398801 -1.301147 0.028864 1.021594 -0.282290 -0.364162 0.601820 0.410930 -0.797828 -0.1294 -0.352250 0.14623 -0.319271 -0.814641 -0.998480 -0.722771 2.290543 -0.132564 -0.360803 -0.112837 -0.271032 -0.069193 40.227307 -0.137395 1.618941
2 0.225525 0.355935 0.874768 0.631240 -0.570667 0.742175 -0.282290 -0.970462 -0.276549 -0.593844 0.813282 -0.1294 0.204585 0.14623 -0.319271 0.783841 0.333696 0.076810 0.890458 -0.163286 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -1.245955 1.618941
3 0.410618 -0.035153 0.907423 0.631240 -0.459643 0.342097 -0.282290 -0.548982 -0.280930 -0.598856 0.760421 -0.1294 0.158182 0.14623 0.286100 0.824828 0.333696 0.020369 2.035278 -0.132564 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.137395 1.618941
4 -1.208944 -0.540054 0.711490 0.341382 -0.570667 -0.375502 -0.282290 1.012973 0.494488 0.288152 -0.797828 -0.1294 -0.443200 -1.03986 -0.924643 0.578907 0.333696 0.189692 -0.749421 0.574025 -0.360803 -0.112837 2.266486 -0.069193 -0.108754 -1.984995 1.618941
In [845]:
## Combine numeric  and category columns of train data and display dimesions of result dataframe.
train_result = ""
train_result = pd.concat([X_train_scaler, X_train_imp_cat], axis=1)
train_result.shape
Out[845]:
(1022, 73)
In [846]:
## Check first 5 records of train data.
train_result.head()
Out[846]:
LotFrontage LotArea YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BedroomAbvGr TotRmsAbvGrd GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold MSSubClass MSZoning Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinType2 Heating HeatingQC CentralAir Electrical KitchenQual Functional FireplaceQu GarageType GarageFinish GarageQual GarageCond PavedDrive PoolQC Fence MiscFeature SaleType SaleCondition
0 -0.561119 -0.150083 0.842112 0.534621 -0.570667 -0.301414 -0.28229 0.009986 -0.403595 -0.628924 -0.797828 -0.1294 -1.122539 -1.03986 -0.924643 0.824828 0.333696 0.518931 -0.749421 -0.685547 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 0.232125 -1.396198 85 RL Pave NAA IR1 Lvl AllPub CulDSac Gtl CollgCr Norm Norm 1Fam SFoyer 5 5 Gable CompShg VinylSd VinylSd None Gd TA PConc Gd TA Av GLQ Unf GasA Ex Y SBrkr TA Typ NF Detchd Unf TA TA Y NP NF NE WD Normal
1 -0.653665 -0.505027 -0.986602 -0.914670 -0.570667 -0.932224 -0.28229 0.361595 -0.714639 0.353300 -0.797828 -0.1294 -0.394941 0.14623 -0.319271 0.046080 -2.330655 -2.190237 1.284957 -0.685547 -0.360803 -0.112837 -0.271032 -0.069193 1.343344 -0.506915 1.618941 30 RL Pave NAA IR1 Lvl AllPub Inside Gtl BrkSide Norm Norm 1Fam 1Story 3 2 Gable CompShg Wd Sdng Plywood None TA Po CBlock TA TA No Unf Unf GasA Ex Y SBrkr Fa Mod NF NG NG NG NG Y NP GdWo Shed WD Normal
2 0.040432 -0.183080 -0.104901 -0.866360 -0.570667 0.850133 -0.28229 -1.121474 -0.311597 -0.633935 -0.797828 -0.1294 -1.126251 0.14623 -0.924643 -0.158854 -0.998480 -0.609889 -0.749421 -0.685547 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.506915 1.618941 20 RL Pave NAA Reg Lvl AllPub FR2 Gtl Sawyer Feedr Norm 1Fam 1Story 5 5 Gable CompShg HdBoard HdBoard None TA TA CBlock TA TA No ALQ Unf GasA TA Y SBrkr TA Typ Fa Detchd Unf TA TA Y NP MnPrv NE WD Normal
3 0.688257 -0.169171 -0.170212 -0.962979 -0.009994 0.428887 -0.28229 -0.276260 0.073922 0.398402 -0.797828 -0.1294 -0.361531 0.14623 -0.319271 -0.527734 0.333696 -0.120733 -0.749421 1.434220 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 2.079725 -1.396198 20 RL Pave NAA Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 6 5 Gable CompShg Wd Sdng Wd Sdng BrkFace TA TA CBlock TA TA No Rec Unf GasA Ex Y SBrkr TA Typ TA Attchd RFn TA TA Y NP GdWo NE WD Abnorml
4 -0.237207 -0.361669 1.201324 1.066028 0.028864 0.473340 -0.28229 -0.053123 0.336776 0.107744 -0.797828 -0.1294 -0.576840 -1.03986 -0.319271 1.193708 0.333696 0.763509 0.062783 0.143927 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 0.971165 -0.642413 20 RL Pave NAA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 7 5 Gable CompShg VinylSd VinylSd BrkFace Gd TA PConc Gd TA No GLQ Unf GasA Ex Y SBrkr Gd Typ NF Attchd RFn TA TA Y NP NF NE New Partial
In [847]:
## Check NA values for train data.
train_result.isna().sum()
Out[847]:
LotFrontage      0
LotArea          0
YearBuilt        0
YearRemodAdd     0
MasVnrArea       0
BsmtFinSF1       0
BsmtFinSF2       0
BsmtUnfSF        0
TotalBsmtSF      0
1stFlrSF         0
2ndFlrSF         0
LowQualFinSF     0
GrLivArea        0
BedroomAbvGr     0
TotRmsAbvGrd     0
GarageYrBlt      0
GarageCars       0
GarageArea       0
WoodDeckSF       0
OpenPorchSF      0
EnclosedPorch    0
3SsnPorch        0
ScreenPorch      0
PoolArea         0
MiscVal          0
MoSold           0
YrSold           0
MSSubClass       0
MSZoning         0
Street           0
Alley            0
LotShape         0
LandContour      0
Utilities        0
LotConfig        0
LandSlope        0
Neighborhood     0
Condition1       0
Condition2       0
BldgType         0
HouseStyle       0
OverallQual      0
OverallCond      0
RoofStyle        0
RoofMatl         0
Exterior1st      0
Exterior2nd      0
MasVnrType       0
ExterQual        0
ExterCond        0
Foundation       0
BsmtQual         0
BsmtCond         0
BsmtExposure     0
BsmtFinType1     0
BsmtFinType2     0
Heating          0
HeatingQC        0
CentralAir       0
Electrical       0
KitchenQual      0
Functional       0
FireplaceQu      0
GarageType       0
GarageFinish     0
GarageQual       0
GarageCond       0
PavedDrive       0
PoolQC           0
Fence            0
MiscFeature      0
SaleType         0
SaleCondition    0
dtype: int64
In [848]:
## Prepare a dataframe with train data.
dataframe1 = pd.DataFrame(train_result)
In [849]:
## Copy dataframe data into a CSV file.
dataframe1.to_csv('TrainDataPreprocess.csv',index=False)
In [850]:
## Combine numeric  and category columns of validation data.
test_result = ""
test_result = pd.concat([X_test_scaler, X_test_imp_cat], axis=1)
In [851]:
## Check dimesions of validation data.
test_result.shape
Out[851]:
(438, 73)
In [852]:
## Get first 5 records of validation data.
test_result.head()
Out[852]:
LotFrontage LotArea YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BedroomAbvGr TotRmsAbvGrd GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold MSSubClass MSZoning Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinType2 Heating HeatingQC CentralAir Electrical KitchenQual Functional FireplaceQu GarageType GarageFinish GarageQual GarageCond PavedDrive PoolQC Fence MiscFeature SaleType SaleCondition
0 -0.052114 -0.083176 1.005390 0.776169 0.428552 -0.932224 -0.28229 0.713204 -0.372929 -0.704094 1.847518 -0.1294 0.958169 0.146230 0.891472 0.947788 0.333696 -0.148954 0.364459 0.051763 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.506915 1.618941 60 RL Pave NAA IR1 Lvl AllPub CulDSac Gtl Gilbert Norm Norm 1Fam 2Story 7 5 Gable CompShg VinylSd VinylSd BrkFace Gd TA PConc Gd TA No Unf Unf GasA Ex Y SBrkr Gd Typ Gd BuiltIn Fin TA TA Y NP NF NE WD Normal
1 -0.422299 -0.073531 -2.292826 -1.687625 -0.570667 -0.932224 -0.28229 1.188778 0.089255 -0.175397 0.762719 -0.1294 0.473722 1.332321 0.891472 -2.372137 1.665871 1.473725 -0.749421 -0.685547 1.088712 -0.112837 -0.271032 -0.069193 -0.108754 -0.506915 -1.396198 70 RM Pave Grvl Reg Lvl AllPub Inside Gtl OldTown Norm Norm 1Fam 2Story 4 2 Gable CompShg AsbShng Stucco None TA TA BrkTil TA Fa No Unf Unf GasW Fa N SBrkr TA Min2 NF 2Types Unf Fa Fa N NP NF NE WD Normal
2 -1.717949 -0.706764 1.103357 0.921098 0.078825 -0.932224 -0.28229 0.920563 -0.171408 -0.473572 0.884529 -0.1294 0.351219 0.146230 0.286100 1.070748 0.333696 -0.402938 -0.749421 -0.071122 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.876435 -1.396198 160 RM Pave NAA Reg Lvl AllPub Inside Gtl Edwards Norm Norm TwnhsE 2Story 7 5 Gable CompShg VinylSd VinylSd Stone Gd TA PConc Gd TA No Unf Unf GasA Ex Y SBrkr Gd Maj1 NF Detchd Unf TA TA Y NP NF NE WD Normal
3 0.271798 -0.058301 0.907423 0.631240 -0.570667 1.131669 -0.28229 -0.979478 0.117731 -0.142823 1.475193 -0.1294 1.073249 0.146230 0.891472 0.824828 0.333696 0.551855 1.207604 1.925761 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.137395 -1.396198 60 RL Pave NAA Reg Lvl AllPub Corner Gtl CollgCr Norm Norm 1Fam 2Story 7 6 Gable CompShg VinylSd VinylSd None TA TA PConc Gd TA Av GLQ Unf GasA Ex Y SBrkr Gd Typ TA Detchd RFn TA TA Y NP NF NE WD Normal
4 -0.422299 0.061502 -0.823324 -1.687625 -0.570667 0.295527 -0.28229 -0.528697 -0.309406 -0.293164 0.374306 -0.1294 0.072801 0.146230 -0.319271 -0.650694 -0.998480 -0.835653 -0.749421 -0.685547 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.876435 0.111371 50 RL Pave Grvl Reg Bnk AllPub Inside Gtl NAmes Artery Norm 1Fam 1.5Fin 5 6 Gable CompShg MetalSd MetalSd None TA TA CBlock TA TA No BLQ Unf GasA TA Y FuseA TA Typ TA Attchd Unf TA TA Y NP NF NE WD Normal
In [853]:
## Check NA values for validation data after combining numeric and category columns.
test_result.isna().sum()
Out[853]:
LotFrontage      0
LotArea          0
YearBuilt        0
YearRemodAdd     0
MasVnrArea       0
BsmtFinSF1       0
BsmtFinSF2       0
BsmtUnfSF        0
TotalBsmtSF      0
1stFlrSF         0
2ndFlrSF         0
LowQualFinSF     0
GrLivArea        0
BedroomAbvGr     0
TotRmsAbvGrd     0
GarageYrBlt      0
GarageCars       0
GarageArea       0
WoodDeckSF       0
OpenPorchSF      0
EnclosedPorch    0
3SsnPorch        0
ScreenPorch      0
PoolArea         0
MiscVal          0
MoSold           0
YrSold           0
MSSubClass       0
MSZoning         0
Street           0
Alley            0
LotShape         0
LandContour      0
Utilities        0
LotConfig        0
LandSlope        0
Neighborhood     0
Condition1       0
Condition2       0
BldgType         0
HouseStyle       0
OverallQual      0
OverallCond      0
RoofStyle        0
RoofMatl         0
Exterior1st      0
Exterior2nd      0
MasVnrType       0
ExterQual        0
ExterCond        0
Foundation       0
BsmtQual         0
BsmtCond         0
BsmtExposure     0
BsmtFinType1     0
BsmtFinType2     0
Heating          0
HeatingQC        0
CentralAir       0
Electrical       0
KitchenQual      0
Functional       0
FireplaceQu      0
GarageType       0
GarageFinish     0
GarageQual       0
GarageCond       0
PavedDrive       0
PoolQC           0
Fence            0
MiscFeature      0
SaleType         0
SaleCondition    0
dtype: int64
In [854]:
## Prepare dataframe with validation data.
dataframe2 = pd.DataFrame(test_result)
In [855]:
## Copy dataframe data into a CSV file.
dataframe2.to_csv('ValidationDataPreprocess.csv',index=False)
In [856]:
## Prepare a dataframe with target column data of train.
dataframe3 = pd.DataFrame(y_train)
In [857]:
## Copy dataframe data into a CSV file.
dataframe3.to_csv('TrainTarget.csv',index=False)
In [858]:
## Prepare a dataframe with target column data of validation.
dataframe4 = pd.DataFrame(y_test)
In [859]:
## Copy dataframe data into a CSV file.
dataframe4.to_csv('ValidationTarget.csv',index=False)
In [860]:
################################################## Dummification###############################################################
In [862]:
## Display category column levels of train data.
for i in X_train_imp_cat:
    print(i , X_train_imp_cat[i].nunique())
MSSubClass 15
MSZoning 5
Street 2
Alley 3
LotShape 4
LandContour 4
Utilities 2
LotConfig 5
LandSlope 3
Neighborhood 25
Condition1 9
Condition2 8
BldgType 5
HouseStyle 8
OverallQual 10
OverallCond 9
RoofStyle 6
RoofMatl 7
Exterior1st 13
Exterior2nd 16
MasVnrType 5
ExterQual 4
ExterCond 5
Foundation 6
BsmtQual 5
BsmtCond 5
BsmtExposure 5
BsmtFinType1 7
BsmtFinType2 7
Heating 5
HeatingQC 5
CentralAir 2
Electrical 6
KitchenQual 4
Functional 6
FireplaceQu 6
GarageType 7
GarageFinish 4
GarageQual 6
GarageCond 6
PavedDrive 3
PoolQC 4
Fence 5
MiscFeature 4
SaleType 9
SaleCondition 6
In [863]:
## Display dimensions of train data.
X_train_imp_cat.shape
Out[863]:
(1022, 46)
In [864]:
## Get dummies for category columns of train data,display dimesnionns and first 5 records.
catcols_train_dummy = pd.get_dummies(columns = X_train_imp_cat.columns, data = X_train_imp_cat, drop_first= True)
print(catcols_train_dummy.shape)
catcols_train_dummy.head()
(1022, 250)
Out[864]:
MSSubClass_160 MSSubClass_180 MSSubClass_190 MSSubClass_20 MSSubClass_30 MSSubClass_40 MSSubClass_45 MSSubClass_50 MSSubClass_60 MSSubClass_70 MSSubClass_75 MSSubClass_80 MSSubClass_85 MSSubClass_90 MSZoning_FV MSZoning_RH MSZoning_RL MSZoning_RM Street_Pave Alley_NAA Alley_Pave LotShape_IR2 LotShape_IR3 LotShape_Reg LandContour_HLS LandContour_Low LandContour_Lvl Utilities_NoSeWa LotConfig_CulDSac LotConfig_FR2 LotConfig_FR3 LotConfig_Inside LandSlope_Mod LandSlope_Sev Neighborhood_Blueste Neighborhood_BrDale Neighborhood_BrkSide Neighborhood_ClearCr Neighborhood_CollgCr Neighborhood_Crawfor Neighborhood_Edwards Neighborhood_Gilbert Neighborhood_IDOTRR Neighborhood_MeadowV Neighborhood_Mitchel Neighborhood_NAmes Neighborhood_NPkVill Neighborhood_NWAmes Neighborhood_NoRidge Neighborhood_NridgHt Neighborhood_OldTown Neighborhood_SWISU Neighborhood_Sawyer Neighborhood_SawyerW Neighborhood_Somerst Neighborhood_StoneBr Neighborhood_Timber Neighborhood_Veenker Condition1_Feedr Condition1_Norm Condition1_PosA Condition1_PosN Condition1_RRAe Condition1_RRAn Condition1_RRNe Condition1_RRNn Condition2_Feedr Condition2_Norm Condition2_PosA Condition2_PosN Condition2_RRAe Condition2_RRAn Condition2_RRNn BldgType_2fmCon BldgType_Duplex BldgType_Twnhs BldgType_TwnhsE HouseStyle_1.5Unf HouseStyle_1Story HouseStyle_2.5Fin HouseStyle_2.5Unf HouseStyle_2Story HouseStyle_SFoyer HouseStyle_SLvl OverallQual_10 OverallQual_2 OverallQual_3 OverallQual_4 OverallQual_5 OverallQual_6 OverallQual_7 OverallQual_8 OverallQual_9 OverallCond_2 OverallCond_3 OverallCond_4 OverallCond_5 OverallCond_6 OverallCond_7 OverallCond_8 ... Foundation_CBlock Foundation_PConc Foundation_Slab Foundation_Stone Foundation_Wood BsmtQual_Fa BsmtQual_Gd BsmtQual_NB BsmtQual_TA BsmtCond_Gd BsmtCond_NB BsmtCond_Po BsmtCond_TA BsmtExposure_Gd BsmtExposure_Mn BsmtExposure_NB BsmtExposure_No BsmtFinType1_BLQ BsmtFinType1_GLQ BsmtFinType1_LwQ BsmtFinType1_NB BsmtFinType1_Rec BsmtFinType1_Unf BsmtFinType2_BLQ BsmtFinType2_GLQ BsmtFinType2_LwQ BsmtFinType2_NB BsmtFinType2_Rec BsmtFinType2_Unf Heating_GasW Heating_Grav Heating_OthW Heating_Wall HeatingQC_Fa HeatingQC_Gd HeatingQC_Po HeatingQC_TA CentralAir_Y Electrical_FuseF Electrical_FuseP Electrical_Mix Electrical_SBrkr Electrical_nan KitchenQual_Fa KitchenQual_Gd KitchenQual_TA Functional_Maj2 Functional_Min1 Functional_Min2 Functional_Mod Functional_Typ FireplaceQu_Fa FireplaceQu_Gd FireplaceQu_NF FireplaceQu_Po FireplaceQu_TA GarageType_Attchd GarageType_Basment GarageType_BuiltIn GarageType_CarPort GarageType_Detchd GarageType_NG GarageFinish_NG GarageFinish_RFn GarageFinish_Unf GarageQual_Fa GarageQual_Gd GarageQual_NG GarageQual_Po GarageQual_TA GarageCond_Fa GarageCond_Gd GarageCond_NG GarageCond_Po GarageCond_TA PavedDrive_P PavedDrive_Y PoolQC_Fa PoolQC_Gd PoolQC_NP Fence_GdWo Fence_MnPrv Fence_MnWw Fence_NF MiscFeature_NE MiscFeature_Shed MiscFeature_TenC SaleType_CWD SaleType_Con SaleType_ConLD SaleType_ConLI SaleType_ConLw SaleType_New SaleType_Oth SaleType_WD SaleCondition_AdjLand SaleCondition_Alloca SaleCondition_Family SaleCondition_Normal SaleCondition_Partial
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 ... 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0
2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 ... 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 ... 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 ... 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1

5 rows × 250 columns

In [866]:
## Display category column levels of validation data.
for i in X_test_imp_cat:
    print(i , X_test_imp_cat[i].nunique())
MSSubClass 15
MSZoning 4
Street 2
Alley 3
LotShape 4
LandContour 4
Utilities 1
LotConfig 5
LandSlope 3
Neighborhood 25
Condition1 8
Condition2 2
BldgType 5
HouseStyle 8
OverallQual 8
OverallCond 8
RoofStyle 5
RoofMatl 5
Exterior1st 14
Exterior2nd 14
MasVnrType 5
ExterQual 4
ExterCond 4
Foundation 6
BsmtQual 5
BsmtCond 4
BsmtExposure 5
BsmtFinType1 7
BsmtFinType2 7
Heating 5
HeatingQC 4
CentralAir 2
Electrical 4
KitchenQual 4
Functional 7
FireplaceQu 6
GarageType 7
GarageFinish 4
GarageQual 6
GarageCond 6
PavedDrive 3
PoolQC 3
Fence 5
MiscFeature 4
SaleType 6
SaleCondition 6
In [867]:
## Display dimesnions of category columns of validation data.
X_test_imp_cat.shape
Out[867]:
(438, 46)
In [872]:
## Get dummies for category columns of train data,display dimesnionns and first 5 records.
catcols_test_dummy = pd.get_dummies(columns = X_test_imp_cat.columns, data = X_test_imp_cat, drop_first= True)
print(catcols_test_dummy.shape)
catcols_test_dummy.head()
(438, 226)
Out[872]:
MSSubClass_160 MSSubClass_180 MSSubClass_190 MSSubClass_20 MSSubClass_30 MSSubClass_40 MSSubClass_45 MSSubClass_50 MSSubClass_60 MSSubClass_70 MSSubClass_75 MSSubClass_80 MSSubClass_85 MSSubClass_90 MSZoning_RH MSZoning_RL MSZoning_RM Street_Pave Alley_NAA Alley_Pave LotShape_IR2 LotShape_IR3 LotShape_Reg LandContour_HLS LandContour_Low LandContour_Lvl LotConfig_CulDSac LotConfig_FR2 LotConfig_FR3 LotConfig_Inside LandSlope_Mod LandSlope_Sev Neighborhood_Blueste Neighborhood_BrDale Neighborhood_BrkSide Neighborhood_ClearCr Neighborhood_CollgCr Neighborhood_Crawfor Neighborhood_Edwards Neighborhood_Gilbert Neighborhood_IDOTRR Neighborhood_MeadowV Neighborhood_Mitchel Neighborhood_NAmes Neighborhood_NPkVill Neighborhood_NWAmes Neighborhood_NoRidge Neighborhood_NridgHt Neighborhood_OldTown Neighborhood_SWISU Neighborhood_Sawyer Neighborhood_SawyerW Neighborhood_Somerst Neighborhood_StoneBr Neighborhood_Timber Neighborhood_Veenker Condition1_Feedr Condition1_Norm Condition1_PosA Condition1_PosN Condition1_RRAe Condition1_RRAn Condition1_RRNn Condition2_Norm BldgType_2fmCon BldgType_Duplex BldgType_Twnhs BldgType_TwnhsE HouseStyle_1.5Unf HouseStyle_1Story HouseStyle_2.5Fin HouseStyle_2.5Unf HouseStyle_2Story HouseStyle_SFoyer HouseStyle_SLvl OverallQual_3 OverallQual_4 OverallQual_5 OverallQual_6 OverallQual_7 OverallQual_8 OverallQual_9 OverallCond_3 OverallCond_4 OverallCond_5 OverallCond_6 OverallCond_7 OverallCond_8 OverallCond_9 RoofStyle_Gable RoofStyle_Hip RoofStyle_Mansard RoofStyle_Shed RoofMatl_Metal RoofMatl_Tar&Grv RoofMatl_WdShake RoofMatl_WdShngl Exterior1st_AsphShn Exterior1st_BrkComm Exterior1st_BrkFace ... MasVnrType_nan ExterQual_Fa ExterQual_Gd ExterQual_TA ExterCond_Fa ExterCond_Gd ExterCond_TA Foundation_CBlock Foundation_PConc Foundation_Slab Foundation_Stone Foundation_Wood BsmtQual_Fa BsmtQual_Gd BsmtQual_NB BsmtQual_TA BsmtCond_Gd BsmtCond_NB BsmtCond_TA BsmtExposure_Gd BsmtExposure_Mn BsmtExposure_NB BsmtExposure_No BsmtFinType1_BLQ BsmtFinType1_GLQ BsmtFinType1_LwQ BsmtFinType1_NB BsmtFinType1_Rec BsmtFinType1_Unf BsmtFinType2_BLQ BsmtFinType2_GLQ BsmtFinType2_LwQ BsmtFinType2_NB BsmtFinType2_Rec BsmtFinType2_Unf Heating_GasA Heating_GasW Heating_Grav Heating_Wall HeatingQC_Fa HeatingQC_Gd HeatingQC_TA CentralAir_Y Electrical_FuseF Electrical_FuseP Electrical_SBrkr KitchenQual_Fa KitchenQual_Gd KitchenQual_TA Functional_Maj2 Functional_Min1 Functional_Min2 Functional_Mod Functional_Sev Functional_Typ FireplaceQu_Fa FireplaceQu_Gd FireplaceQu_NF FireplaceQu_Po FireplaceQu_TA GarageType_Attchd GarageType_Basment GarageType_BuiltIn GarageType_CarPort GarageType_Detchd GarageType_NG GarageFinish_NG GarageFinish_RFn GarageFinish_Unf GarageQual_Fa GarageQual_Gd GarageQual_NG GarageQual_Po GarageQual_TA GarageCond_Fa GarageCond_Gd GarageCond_NG GarageCond_Po GarageCond_TA PavedDrive_P PavedDrive_Y PoolQC_Gd PoolQC_NP Fence_GdWo Fence_MnPrv Fence_MnWw Fence_NF MiscFeature_NE MiscFeature_Othr MiscFeature_Shed SaleType_ConLD SaleType_ConLI SaleType_ConLw SaleType_New SaleType_WD SaleCondition_AdjLand SaleCondition_Alloca SaleCondition_Family SaleCondition_Normal SaleCondition_Partial
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ... 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0
1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0
2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ... 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0
3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0
4 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0

5 rows × 226 columns

In [873]:
#train_levels,test_levels = catcols_train_dummy.align(catcols_test_dummy, join='outer', axis=1, fill_value=0)
In [877]:
## Get missing columns in the validation data test.
missing_cols = set( catcols_train_dummy.columns ) - set( catcols_test_dummy.columns )
## Add a missing column in test set with default value equal to 0.
for c in missing_cols:
    catcols_test_dummy[c] = 0
catcols_test_dummy = catcols_test_dummy[catcols_train_dummy.columns]
In [878]:
## Display dimensions of category columns of train data.
catcols_train_dummy.shape
Out[878]:
(1022, 250)
In [879]:
## Display dimensions of category columns of validation data.
catcols_test_dummy.shape
Out[879]:
(438, 250)
In [881]:
## Combine numeric and category columns of train data.
train_data_final = pd.concat([X_train_scaler, catcols_train_dummy], axis=1)
In [882]:
## Check dimesnions of train data.
train_data_final.shape
Out[882]:
(1022, 277)
In [883]:
## Check dimesnions of target varible of train data.
y_train.shape
Out[883]:
(1022,)
In [884]:
## Check first 5 records of train ata.
train_data_final.head()
Out[884]:
LotFrontage LotArea YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BedroomAbvGr TotRmsAbvGrd GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold MSSubClass_160 MSSubClass_180 MSSubClass_190 MSSubClass_20 MSSubClass_30 MSSubClass_40 MSSubClass_45 MSSubClass_50 MSSubClass_60 MSSubClass_70 MSSubClass_75 MSSubClass_80 MSSubClass_85 MSSubClass_90 MSZoning_FV MSZoning_RH MSZoning_RL MSZoning_RM Street_Pave Alley_NAA Alley_Pave LotShape_IR2 LotShape_IR3 LotShape_Reg LandContour_HLS LandContour_Low LandContour_Lvl Utilities_NoSeWa LotConfig_CulDSac LotConfig_FR2 LotConfig_FR3 LotConfig_Inside LandSlope_Mod LandSlope_Sev Neighborhood_Blueste Neighborhood_BrDale Neighborhood_BrkSide Neighborhood_ClearCr Neighborhood_CollgCr Neighborhood_Crawfor Neighborhood_Edwards Neighborhood_Gilbert Neighborhood_IDOTRR Neighborhood_MeadowV Neighborhood_Mitchel Neighborhood_NAmes Neighborhood_NPkVill Neighborhood_NWAmes Neighborhood_NoRidge Neighborhood_NridgHt Neighborhood_OldTown Neighborhood_SWISU Neighborhood_Sawyer Neighborhood_SawyerW Neighborhood_Somerst Neighborhood_StoneBr Neighborhood_Timber Neighborhood_Veenker Condition1_Feedr Condition1_Norm Condition1_PosA Condition1_PosN Condition1_RRAe Condition1_RRAn Condition1_RRNe Condition1_RRNn Condition2_Feedr Condition2_Norm Condition2_PosA Condition2_PosN Condition2_RRAe Condition2_RRAn Condition2_RRNn ... Foundation_CBlock Foundation_PConc Foundation_Slab Foundation_Stone Foundation_Wood BsmtQual_Fa BsmtQual_Gd BsmtQual_NB BsmtQual_TA BsmtCond_Gd BsmtCond_NB BsmtCond_Po BsmtCond_TA BsmtExposure_Gd BsmtExposure_Mn BsmtExposure_NB BsmtExposure_No BsmtFinType1_BLQ BsmtFinType1_GLQ BsmtFinType1_LwQ BsmtFinType1_NB BsmtFinType1_Rec BsmtFinType1_Unf BsmtFinType2_BLQ BsmtFinType2_GLQ BsmtFinType2_LwQ BsmtFinType2_NB BsmtFinType2_Rec BsmtFinType2_Unf Heating_GasW Heating_Grav Heating_OthW Heating_Wall HeatingQC_Fa HeatingQC_Gd HeatingQC_Po HeatingQC_TA CentralAir_Y Electrical_FuseF Electrical_FuseP Electrical_Mix Electrical_SBrkr Electrical_nan KitchenQual_Fa KitchenQual_Gd KitchenQual_TA Functional_Maj2 Functional_Min1 Functional_Min2 Functional_Mod Functional_Typ FireplaceQu_Fa FireplaceQu_Gd FireplaceQu_NF FireplaceQu_Po FireplaceQu_TA GarageType_Attchd GarageType_Basment GarageType_BuiltIn GarageType_CarPort GarageType_Detchd GarageType_NG GarageFinish_NG GarageFinish_RFn GarageFinish_Unf GarageQual_Fa GarageQual_Gd GarageQual_NG GarageQual_Po GarageQual_TA GarageCond_Fa GarageCond_Gd GarageCond_NG GarageCond_Po GarageCond_TA PavedDrive_P PavedDrive_Y PoolQC_Fa PoolQC_Gd PoolQC_NP Fence_GdWo Fence_MnPrv Fence_MnWw Fence_NF MiscFeature_NE MiscFeature_Shed MiscFeature_TenC SaleType_CWD SaleType_Con SaleType_ConLD SaleType_ConLI SaleType_ConLw SaleType_New SaleType_Oth SaleType_WD SaleCondition_AdjLand SaleCondition_Alloca SaleCondition_Family SaleCondition_Normal SaleCondition_Partial
0 -0.561119 -0.150083 0.842112 0.534621 -0.570667 -0.301414 -0.28229 0.009986 -0.403595 -0.628924 -0.797828 -0.1294 -1.122539 -1.03986 -0.924643 0.824828 0.333696 0.518931 -0.749421 -0.685547 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 0.232125 -1.396198 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
1 -0.653665 -0.505027 -0.986602 -0.914670 -0.570667 -0.932224 -0.28229 0.361595 -0.714639 0.353300 -0.797828 -0.1294 -0.394941 0.14623 -0.319271 0.046080 -2.330655 -2.190237 1.284957 -0.685547 -0.360803 -0.112837 -0.271032 -0.069193 1.343344 -0.506915 1.618941 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0
2 0.040432 -0.183080 -0.104901 -0.866360 -0.570667 0.850133 -0.28229 -1.121474 -0.311597 -0.633935 -0.797828 -0.1294 -1.126251 0.14623 -0.924643 -0.158854 -0.998480 -0.609889 -0.749421 -0.685547 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.506915 1.618941 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
3 0.688257 -0.169171 -0.170212 -0.962979 -0.009994 0.428887 -0.28229 -0.276260 0.073922 0.398402 -0.797828 -0.1294 -0.361531 0.14623 -0.319271 -0.527734 0.333696 -0.120733 -0.749421 1.434220 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 2.079725 -1.396198 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
4 -0.237207 -0.361669 1.201324 1.066028 0.028864 0.473340 -0.28229 -0.053123 0.336776 0.107744 -0.797828 -0.1294 -0.576840 -1.03986 -0.319271 1.193708 0.333696 0.763509 0.062783 0.143927 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 0.971165 -0.642413 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1

5 rows × 277 columns

In [885]:
## Combine numeric and category columns of validation data.
test_data_final = pd.concat([X_test_scaler, catcols_test_dummy], axis=1)
In [886]:
## Display dimensions of validation data.
test_data_final.shape
Out[886]:
(438, 277)
In [887]:
## Get first 5 records of validation data.
test_data_final.head()
Out[887]:
LotFrontage LotArea YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF TotalBsmtSF 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BedroomAbvGr TotRmsAbvGrd GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea MiscVal MoSold YrSold MSSubClass_160 MSSubClass_180 MSSubClass_190 MSSubClass_20 MSSubClass_30 MSSubClass_40 MSSubClass_45 MSSubClass_50 MSSubClass_60 MSSubClass_70 MSSubClass_75 MSSubClass_80 MSSubClass_85 MSSubClass_90 MSZoning_FV MSZoning_RH MSZoning_RL MSZoning_RM Street_Pave Alley_NAA Alley_Pave LotShape_IR2 LotShape_IR3 LotShape_Reg LandContour_HLS LandContour_Low LandContour_Lvl Utilities_NoSeWa LotConfig_CulDSac LotConfig_FR2 LotConfig_FR3 LotConfig_Inside LandSlope_Mod LandSlope_Sev Neighborhood_Blueste Neighborhood_BrDale Neighborhood_BrkSide Neighborhood_ClearCr Neighborhood_CollgCr Neighborhood_Crawfor Neighborhood_Edwards Neighborhood_Gilbert Neighborhood_IDOTRR Neighborhood_MeadowV Neighborhood_Mitchel Neighborhood_NAmes Neighborhood_NPkVill Neighborhood_NWAmes Neighborhood_NoRidge Neighborhood_NridgHt Neighborhood_OldTown Neighborhood_SWISU Neighborhood_Sawyer Neighborhood_SawyerW Neighborhood_Somerst Neighborhood_StoneBr Neighborhood_Timber Neighborhood_Veenker Condition1_Feedr Condition1_Norm Condition1_PosA Condition1_PosN Condition1_RRAe Condition1_RRAn Condition1_RRNe Condition1_RRNn Condition2_Feedr Condition2_Norm Condition2_PosA Condition2_PosN Condition2_RRAe Condition2_RRAn Condition2_RRNn ... Foundation_CBlock Foundation_PConc Foundation_Slab Foundation_Stone Foundation_Wood BsmtQual_Fa BsmtQual_Gd BsmtQual_NB BsmtQual_TA BsmtCond_Gd BsmtCond_NB BsmtCond_Po BsmtCond_TA BsmtExposure_Gd BsmtExposure_Mn BsmtExposure_NB BsmtExposure_No BsmtFinType1_BLQ BsmtFinType1_GLQ BsmtFinType1_LwQ BsmtFinType1_NB BsmtFinType1_Rec BsmtFinType1_Unf BsmtFinType2_BLQ BsmtFinType2_GLQ BsmtFinType2_LwQ BsmtFinType2_NB BsmtFinType2_Rec BsmtFinType2_Unf Heating_GasW Heating_Grav Heating_OthW Heating_Wall HeatingQC_Fa HeatingQC_Gd HeatingQC_Po HeatingQC_TA CentralAir_Y Electrical_FuseF Electrical_FuseP Electrical_Mix Electrical_SBrkr Electrical_nan KitchenQual_Fa KitchenQual_Gd KitchenQual_TA Functional_Maj2 Functional_Min1 Functional_Min2 Functional_Mod Functional_Typ FireplaceQu_Fa FireplaceQu_Gd FireplaceQu_NF FireplaceQu_Po FireplaceQu_TA GarageType_Attchd GarageType_Basment GarageType_BuiltIn GarageType_CarPort GarageType_Detchd GarageType_NG GarageFinish_NG GarageFinish_RFn GarageFinish_Unf GarageQual_Fa GarageQual_Gd GarageQual_NG GarageQual_Po GarageQual_TA GarageCond_Fa GarageCond_Gd GarageCond_NG GarageCond_Po GarageCond_TA PavedDrive_P PavedDrive_Y PoolQC_Fa PoolQC_Gd PoolQC_NP Fence_GdWo Fence_MnPrv Fence_MnWw Fence_NF MiscFeature_NE MiscFeature_Shed MiscFeature_TenC SaleType_CWD SaleType_Con SaleType_ConLD SaleType_ConLI SaleType_ConLw SaleType_New SaleType_Oth SaleType_WD SaleCondition_AdjLand SaleCondition_Alloca SaleCondition_Family SaleCondition_Normal SaleCondition_Partial
0 -0.052114 -0.083176 1.005390 0.776169 0.428552 -0.932224 -0.28229 0.713204 -0.372929 -0.704094 1.847518 -0.1294 0.958169 0.146230 0.891472 0.947788 0.333696 -0.148954 0.364459 0.051763 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.506915 1.618941 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
1 -0.422299 -0.073531 -2.292826 -1.687625 -0.570667 -0.932224 -0.28229 1.188778 0.089255 -0.175397 0.762719 -0.1294 0.473722 1.332321 0.891472 -2.372137 1.665871 1.473725 -0.749421 -0.685547 1.088712 -0.112837 -0.271032 -0.069193 -0.108754 -0.506915 -1.396198 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
2 -1.717949 -0.706764 1.103357 0.921098 0.078825 -0.932224 -0.28229 0.920563 -0.171408 -0.473572 0.884529 -0.1294 0.351219 0.146230 0.286100 1.070748 0.333696 -0.402938 -0.749421 -0.071122 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.876435 -1.396198 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
3 0.271798 -0.058301 0.907423 0.631240 -0.570667 1.131669 -0.28229 -0.979478 0.117731 -0.142823 1.475193 -0.1294 1.073249 0.146230 0.891472 0.824828 0.333696 0.551855 1.207604 1.925761 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.137395 -1.396198 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
4 -0.422299 0.061502 -0.823324 -1.687625 -0.570667 0.295527 -0.28229 -0.528697 -0.309406 -0.293164 0.374306 -0.1294 0.072801 0.146230 -0.319271 -0.650694 -0.998480 -0.835653 -0.749421 -0.685547 -0.360803 -0.112837 -0.271032 -0.069193 -0.108754 -0.876435 0.111371 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0

5 rows × 277 columns

In [889]:
## Display category columns levels of test data.
for i in test_imp_cat:
    print(i , test_imp_cat[i].nunique())
MSSubClass 16
MSZoning 6
Street 2
Alley 3
LotShape 4
LandContour 4
Utilities 2
LotConfig 5
LandSlope 3
Neighborhood 25
Condition1 9
Condition2 5
BldgType 5
HouseStyle 7
OverallQual 10
OverallCond 9
RoofStyle 6
RoofMatl 4
Exterior1st 14
Exterior2nd 16
MasVnrType 5
ExterQual 4
ExterCond 5
Foundation 6
BsmtQual 5
BsmtCond 5
BsmtExposure 5
BsmtFinType1 7
BsmtFinType2 7
Heating 4
HeatingQC 5
CentralAir 2
Electrical 4
KitchenQual 5
Functional 8
FireplaceQu 6
GarageType 7
GarageFinish 4
GarageQual 5
GarageCond 6
PavedDrive 3
PoolQC 3
Fence 5
MiscFeature 4
SaleType 10
SaleCondition 6
In [890]:
## Check dimensions of category columns of test data before doing dummification.
test_imp_cat.shape
Out[890]:
(1459, 46)
In [899]:
## Get dummies for category columns of test data,display dimesnionns and first 5 records.
test_catcols_dummy = pd.get_dummies(columns = test_imp_cat.columns, data = test_imp_cat, drop_first= True)
print(test_catcols_dummy.shape)
test_catcols_dummy.head()
(1459, 245)
Out[899]:
MSSubClass_150 MSSubClass_160 MSSubClass_180 MSSubClass_190 MSSubClass_20 MSSubClass_30 MSSubClass_40 MSSubClass_45 MSSubClass_50 MSSubClass_60 MSSubClass_70 MSSubClass_75 MSSubClass_80 MSSubClass_85 MSSubClass_90 MSZoning_FV MSZoning_RH MSZoning_RL MSZoning_RM MSZoning_nan Street_Pave Alley_NAA Alley_Pave LotShape_IR2 LotShape_IR3 LotShape_Reg LandContour_HLS LandContour_Low LandContour_Lvl Utilities_nan LotConfig_CulDSac LotConfig_FR2 LotConfig_FR3 LotConfig_Inside LandSlope_Mod LandSlope_Sev Neighborhood_Blueste Neighborhood_BrDale Neighborhood_BrkSide Neighborhood_ClearCr Neighborhood_CollgCr Neighborhood_Crawfor Neighborhood_Edwards Neighborhood_Gilbert Neighborhood_IDOTRR Neighborhood_MeadowV Neighborhood_Mitchel Neighborhood_NAmes Neighborhood_NPkVill Neighborhood_NWAmes Neighborhood_NoRidge Neighborhood_NridgHt Neighborhood_OldTown Neighborhood_SWISU Neighborhood_Sawyer Neighborhood_SawyerW Neighborhood_Somerst Neighborhood_StoneBr Neighborhood_Timber Neighborhood_Veenker Condition1_Feedr Condition1_Norm Condition1_PosA Condition1_PosN Condition1_RRAe Condition1_RRAn Condition1_RRNe Condition1_RRNn Condition2_Feedr Condition2_Norm Condition2_PosA Condition2_PosN BldgType_2fmCon BldgType_Duplex BldgType_Twnhs BldgType_TwnhsE HouseStyle_1.5Unf HouseStyle_1Story HouseStyle_2.5Unf HouseStyle_2Story HouseStyle_SFoyer HouseStyle_SLvl OverallQual_10 OverallQual_2 OverallQual_3 OverallQual_4 OverallQual_5 OverallQual_6 OverallQual_7 OverallQual_8 OverallQual_9 OverallCond_2 OverallCond_3 OverallCond_4 OverallCond_5 OverallCond_6 OverallCond_7 OverallCond_8 OverallCond_9 RoofStyle_Gable ... ExterCond_TA Foundation_CBlock Foundation_PConc Foundation_Slab Foundation_Stone Foundation_Wood BsmtQual_Fa BsmtQual_Gd BsmtQual_NB BsmtQual_TA BsmtCond_Gd BsmtCond_NB BsmtCond_Po BsmtCond_TA BsmtExposure_Gd BsmtExposure_Mn BsmtExposure_NB BsmtExposure_No BsmtFinType1_BLQ BsmtFinType1_GLQ BsmtFinType1_LwQ BsmtFinType1_NB BsmtFinType1_Rec BsmtFinType1_Unf BsmtFinType2_BLQ BsmtFinType2_GLQ BsmtFinType2_LwQ BsmtFinType2_NB BsmtFinType2_Rec BsmtFinType2_Unf Heating_GasW Heating_Grav Heating_Wall HeatingQC_Fa HeatingQC_Gd HeatingQC_Po HeatingQC_TA CentralAir_Y Electrical_FuseF Electrical_FuseP Electrical_SBrkr KitchenQual_Fa KitchenQual_Gd KitchenQual_TA KitchenQual_nan Functional_Maj2 Functional_Min1 Functional_Min2 Functional_Mod Functional_Sev Functional_Typ Functional_nan FireplaceQu_Fa FireplaceQu_Gd FireplaceQu_NF FireplaceQu_Po FireplaceQu_TA GarageType_Attchd GarageType_Basment GarageType_BuiltIn GarageType_CarPort GarageType_Detchd GarageType_NG GarageFinish_NG GarageFinish_RFn GarageFinish_Unf GarageQual_Gd GarageQual_NG GarageQual_Po GarageQual_TA GarageCond_Fa GarageCond_Gd GarageCond_NG GarageCond_Po GarageCond_TA PavedDrive_P PavedDrive_Y PoolQC_Gd PoolQC_NP Fence_GdWo Fence_MnPrv Fence_MnWw Fence_NF MiscFeature_NE MiscFeature_Othr MiscFeature_Shed SaleType_CWD SaleType_Con SaleType_ConLD SaleType_ConLI SaleType_ConLw SaleType_New SaleType_Oth SaleType_WD SaleType_nan SaleCondition_AdjLand SaleCondition_Alloca SaleCondition_Family SaleCondition_Normal SaleCondition_Partial
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 ... 1 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 ... 1 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
2 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 ... 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
3 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 ... 1 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 ... 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0

5 rows × 245 columns

In [904]:
## Get missing columns in the test data set.
missing_cols = set( catcols_train_dummy.columns ) - set( test_catcols_dummy.columns )
## Add a missing column in test set with default value equal to 0.
for c in missing_cols:
    test_catcols_dummy[c] = 0
test_catcols_dummy = test_catcols_dummy[catcols_train_dummy.columns]
In [906]:
## Check dimesnions of categoory columns of test data.
test_catcols_dummy.shape
Out[906]:
(1459, 250)
In [907]:
## Combine cateory and numeric columns of test data.
test_data_combine = pd.concat([test_scaler, test_catcols_dummy], axis=1)
In [139]:
############################################### Decision Tree ##################################################################
In [909]:
## Import decision tree model.
from sklearn.tree import DecisionTreeRegressor
In [910]:
## Instantiate and fit a regression model.
dtr = DecisionTreeRegressor(max_depth=5,min_samples_leaf=10,min_samples_split=5,random_state=123)
dtr.fit(train_data_final,y_train)
Out[910]:
DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=5,
                      max_features=None, max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=10, min_samples_split=5,
                      min_weight_fraction_leaf=0.0, presort='deprecated',
                      random_state=123, splitter='best')
In [911]:
## Get the predictions on train and validation data.
pred_train = dtr.predict(train_data_final)
pred_test = dtr.predict(test_data_final)
In [912]:
## Get predictions for test data.
test_pred = dtr.predict(test_data_combine)
In [932]:
## Check dimensions of test data index column.
test_data.index.shape
Out[932]:
(1459,)
In [933]:
## Check dimesnions of test predictions data.
test_pred.shape
Out[933]:
(1459,)
In [935]:
## Prepare a dataframe with test data index,prediction values.
dataframe5 = pd.DataFrame({'Id' : test_data.index,
                          'SalePrice' : test_pred})
In [936]:
## Copy dataframe data into a CSV file.
dataframe5.to_csv('PredictionValues.csv',index=False)
In [913]:
## Import error metric libraries to measure RMSE.
from sklearn.metrics import mean_squared_error
from math import sqrt
In [914]:
## Display train and validation RMSE.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 35145.50572611597
Test Error: 37672.08587542323
In [193]:
############################################## Random Forest ##################################################################
In [1]:
## Import random forest regressor model.
from sklearn.ensemble import RandomForestRegressor
In [938]:
## Instantiate a regressor model.
rc = RandomForestRegressor(n_estimators= 25, max_depth= 10)## ,min_samples_leaf = 2)## ,max_features='sqrt')
In [939]:
## Fit a model.
rc.fit(train_data_final,y_train)
Out[939]:
RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=10, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=25, n_jobs=None, oob_score=False,
                      random_state=None, verbose=0, warm_start=False)
In [940]:
## Get the predictions on train and validation data.
pred_train = rc.predict(train_data_final)
pred_test = rc.predict(test_data_final)
In [941]:
## Get predictions on test data.
test_pred = rc.predict(test_data_combine)
In [942]:
## Prepare a dataframe with test data index,prediction values.
dataframe6 = pd.DataFrame({'Id' : test_data.index,
                          'SalePrice' : test_pred})
In [944]:
## Copy dataframe data into a CSV file.
dataframe6.to_csv('PredictionValues.csv',index=False)
In [945]:
## Display RMSE values for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 14189.808357485022
Test Error: 28007.627868664316
In [ ]:
################################################### AdaBoost ##################################################################
In [294]:
## Import adaboost regressor model.
from sklearn.ensemble import AdaBoostRegressor
In [310]:
## Instantiate regressor model and fit it.
Adaboost_model = AdaBoostRegressor(n_estimators=50,learning_rate=1)
%time Adaboost_model.fit(train_data_final, y_train)
Wall time: 424 ms
Out[310]:
AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
                  n_estimators=50, random_state=None)
In [311]:
## Get the predictions on train and validation data.
pred_train = Adaboost_model.predict(train_data_final)
pred_test = Adaboost_model.predict(test_data_final)
In [ ]:
## Get predictions on test data.
test_pred = Adaboost_model.predict(test_data_combine)
In [312]:
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 30520.585335079977
Test Error: 36586.41368072027
In [ ]:
##################################################### GradientBoosting #########################################################
In [313]:
## Import Graident boot model library.
from sklearn.ensemble import GradientBoostingRegressor
In [360]:
## Innstantiate GBR and fit it.
gbm = GradientBoostingRegressor(n_estimators=50,learning_rate=0.8,random_state=474)
%time gbm.fit(X=train_data_final, y=y_train)
Wall time: 318 ms
Out[360]:
GradientBoostingRegressor(alpha=0.9, ccp_alpha=0.0, criterion='friedman_mse',
                          init=None, learning_rate=0.8, loss='ls', max_depth=3,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=50,
                          n_iter_no_change=None, presort='deprecated',
                          random_state=474, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=False)
In [361]:
## Get the predictions on train and validation.
pred_train = gbm.predict(train_data_final)
pred_test = gbm.predict(test_data_final)
In [ ]:
## Get predictions on test data.
test_pred = gbm.predict(test_data_combine)
In [362]:
## Dispay RMSE value for train and validation.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 8570.43403879048
Test Error: 30745.40731980334
In [363]:
################################################## XGradient Boosting ##########################################################
In [2]:
## Import XGBoost model library.
import xgboost as xgb
from xgboost.sklearn import XGBRegressor
In [948]:
## Instantiate XGBR and fit it.
xgb_model=XGBRegressor(n_estimators=100,learning_rate=0.8)
%time xgb_model.fit(train_data_final,y_train,verbose=True)
C:\Users\nagar\Anaconda3\lib\site-packages\xgboost\core.py:587: FutureWarning: Series.base is deprecated and will be removed in a future version
  if getattr(data, 'base', None) is not None and \
[00:56:41] WARNING: C:/Jenkins/workspace/xgboost-win64_release_0.90/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
Wall time: 1.42 s
Out[948]:
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0,
             importance_type='gain', learning_rate=0.8, max_delta_step=0,
             max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
             n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
             silent=None, subsample=1, verbosity=1)
In [949]:
## Get the predictions on train and validation.
pred_train = xgb_model.predict(train_data_final)
pred_test = xgb_model.predict(test_data_final)
In [950]:
## Get predictions on test data.
test_pred = xgb_model.predict(test_data_combine)
In [951]:
## Prepare a dataframe with test inndex,preidction values.
dataframe7 = pd.DataFrame({'Id' : test_data.index,
                          'SalePrice' : test_pred})
In [952]:
## Copy dataframe data into a CSV file.
dataframe7.to_csv('PredictionValues.csv',index=False)
In [971]:
## Display scatter plot for actual target and prediction values.
plt.figure(figsize=(15,8))
plt.scatter(y_train,pred_train, c= 'brown')
plt.xlabel('Y Train')
plt.ylabel('Predicted Y')
plt.show()
In [953]:
## Get RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 4641.200474416745
Test Error: 32621.566438550162
In [391]:
#################################################### SVM #######################################################################
In [395]:
## Import SVR model library.
from sklearn.svm import SVR
In [396]:
## Instantiate SVR model.
svr_model = SVR()
svr_model
Out[396]:
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
In [399]:
## Fit a model.
svr_model.fit(X = train_data_final, y = y_train)
Out[399]:
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
    kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
In [400]:
## Get the predictions on train and validation.
pred_train = svr_model.predict(train_data_final)
pred_test = svr_model.predict(test_data_final)
In [ ]:
## Get predictions on test data.
test_pred = svr_model.predict(test_data_combine)
In [401]:
## Get predictions on test data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 83315.89848130586
Test Error: 79142.24398988248
In [ ]:
################################################### KNN #######################################################################
In [537]:
## Import KNN model library.
from sklearn.neighbors import KNeighborsRegressor
In [548]:
## Instantiate KNN model and fit it.
knn = KNeighborsRegressor(algorithm = 'brute', n_neighbors = 4,
                           metric = "euclidean")
knn.fit(train_data_final, y_train)
Out[548]:
KNeighborsRegressor(algorithm='brute', leaf_size=30, metric='euclidean',
                    metric_params=None, n_jobs=None, n_neighbors=4, p=2,
                    weights='uniform')
In [549]:
## Get the predictions on train and validation.
pred_train = knn.predict(train_data_final)
pred_test = knn.predict(test_data_final)
In [ ]:
## Get predictions on test data.
test_pred = knn.predict(test_data_combine)
In [550]:
## Display RMSE values fo train and validation.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 31531.213116588428
Test Error: 37136.77105383857
In [551]:
############################################# Neural Network Linear Algoritham #################################################
In [3]:
## Import Sequential,Dense model libraries.
from keras.models import Sequential
from keras.layers import Dense
C:\Users\nagar\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
In [562]:
## Instantiate squential model.
model = Sequential()

## Add dense model.
model.add(Dense(1, input_dim=train_data_final.shape[1]))

## Add compiler to model.
model.compile(loss='mse', optimizer='rmsprop')

## Fit a model.
model.fit(train_data_final, y_train, epochs=150, batch_size=32)
Epoch 1/150
1022/1022 [==============================] - 0s 184us/step - loss: 38950132341.2290
Epoch 2/150
1022/1022 [==============================] - 0s 65us/step - loss: 38949634272.4384
Epoch 3/150
1022/1022 [==============================] - 0s 73us/step - loss: 38949164879.6556
Epoch 4/150
1022/1022 [==============================] - 0s 65us/step - loss: 38948694541.0254
Epoch 5/150
1022/1022 [==============================] - 0s 63us/step - loss: 38948221424.9706
Epoch 6/150
1022/1022 [==============================] - 0s 62us/step - loss: 38947748821.9178
Epoch 7/150
1022/1022 [==============================] - 0s 63us/step - loss: 38947278976.2505
Epoch 8/150
1022/1022 [==============================] - 0s 64us/step - loss: 38946805475.4442
Epoch 9/150
1022/1022 [==============================] - 0s 64us/step - loss: 38946334836.2270
Epoch 10/150
1022/1022 [==============================] - 0s 63us/step - loss: 38945865423.4051
Epoch 11/150
1022/1022 [==============================] - 0s 45us/step - loss: 38945391818.3953
Epoch 12/150
1022/1022 [==============================] - 0s 30us/step - loss: 38944927595.7104
Epoch 13/150
1022/1022 [==============================] - 0s 38us/step - loss: 38944455750.1370
Epoch 14/150
1022/1022 [==============================] - 0s 62us/step - loss: 38943984902.5127
Epoch 15/150
1022/1022 [==============================] - 0s 68us/step - loss: 38943514335.4364
Epoch 16/150
1022/1022 [==============================] - 0s 66us/step - loss: 38943043559.9530
Epoch 17/150
1022/1022 [==============================] - 0s 114us/step - loss: 38942572123.1781
Epoch 18/150
1022/1022 [==============================] - 0s 118us/step - loss: 38942101592.1722
Epoch 19/150
1022/1022 [==============================] - 0s 105us/step - loss: 38941631490.0039
Epoch 20/150
1022/1022 [==============================] - 0s 113us/step - loss: 38941163191.3581
Epoch 21/150
1022/1022 [==============================] - 0s 98us/step - loss: 38940687963.1781
Epoch 22/150
1022/1022 [==============================] - 0s 113us/step - loss: 38940215340.0861
Epoch 23/150
1022/1022 [==============================] - 0s 126us/step - loss: 38939746087.5773
Epoch 24/150
1022/1022 [==============================] - 0s 118us/step - loss: 38939272278.1683
Epoch 25/150
1022/1022 [==============================] - 0s 119us/step - loss: 38938800705.1272
Epoch 26/150
1022/1022 [==============================] - 0s 102us/step - loss: 38938326987.8982
Epoch 27/150
1022/1022 [==============================] - 0s 128us/step - loss: 38937856156.3053
Epoch 28/150
1022/1022 [==============================] - 0s 103us/step - loss: 38937385749.5421
Epoch 29/150
1022/1022 [==============================] - 0s 128us/step - loss: 38936916128.3131
Epoch 30/150
1022/1022 [==============================] - 0s 139us/step - loss: 38936443036.3053
Epoch 31/150
1022/1022 [==============================] - 0s 124us/step - loss: 38935966850.2544
Epoch 32/150
1022/1022 [==============================] - 0s 125us/step - loss: 38935498335.1859
Epoch 33/150
1022/1022 [==============================] - 0s 121us/step - loss: 38935027595.7730
Epoch 34/150
1022/1022 [==============================] - 0s 126us/step - loss: 38934554575.9061
Epoch 35/150
1022/1022 [==============================] - 0s 117us/step - loss: 38934078297.6751
Epoch 36/150
1022/1022 [==============================] - 0s 127us/step - loss: 38933606452.1018
Epoch 37/150
1022/1022 [==============================] - 0s 123us/step - loss: 38933137327.8434
Epoch 38/150
1022/1022 [==============================] - 0s 103us/step - loss: 38932665971.2250
Epoch 39/150
1022/1022 [==============================] - 0s 147us/step - loss: 38932197985.1898
Epoch 40/150
1022/1022 [==============================] - 0s 148us/step - loss: 38931723446.3562
Epoch 41/150
1022/1022 [==============================] - 0s 128us/step - loss: 38931247705.1742
Epoch 42/150
1022/1022 [==============================] - 0s 124us/step - loss: 38930781755.1155
Epoch 43/150
1022/1022 [==============================] - 0s 121us/step - loss: 38930308238.2779
Epoch 44/150
1022/1022 [==============================] - 0s 112us/step - loss: 38929839057.9100
Epoch 45/150
1022/1022 [==============================] - 0s 109us/step - loss: 38929368971.7730
Epoch 46/150
1022/1022 [==============================] - 0s 98us/step - loss: 38928897426.7867
Epoch 47/150
1022/1022 [==============================] - 0s 101us/step - loss: 38928424799.6869
Epoch 48/150
1022/1022 [==============================] - 0s 116us/step - loss: 38927952926.0587
Epoch 49/150
1022/1022 [==============================] - 0s 105us/step - loss: 38927480607.5616
Epoch 50/150
1022/1022 [==============================] - 0s 105us/step - loss: 38927012705.6908
Epoch 51/150
1022/1022 [==============================] - 0s 112us/step - loss: 38926542968.2348
Epoch 52/150
1022/1022 [==============================] - 0s 97us/step - loss: 38926072485.3229
Epoch 53/150
1022/1022 [==============================] - 0s 101us/step - loss: 38925599024.5949
Epoch 54/150
1022/1022 [==============================] - 0s 115us/step - loss: 38925129114.8023
Epoch 55/150
1022/1022 [==============================] - 0s 106us/step - loss: 38924656904.5166
Epoch 56/150
1022/1022 [==============================] - 0s 101us/step - loss: 38924186962.6614
Epoch 57/150
1022/1022 [==============================] - 0s 110us/step - loss: 38923716331.4599
Epoch 58/150
1022/1022 [==============================] - 0s 98us/step - loss: 38923243115.2094
Epoch 59/150
1022/1022 [==============================] - 0s 40us/step - loss: 38922773994.9589
Epoch 60/150
1022/1022 [==============================] - 0s 50us/step - loss: 38922302289.6595
Epoch 61/150
1022/1022 [==============================] - 0s 113us/step - loss: 38921833646.3405
Epoch 62/150
1022/1022 [==============================] - 0s 117us/step - loss: 38921363275.6477
Epoch 63/150
1022/1022 [==============================] - 0s 113us/step - loss: 38920891438.0900
Epoch 64/150
1022/1022 [==============================] - 0s 127us/step - loss: 38920420109.5264
Epoch 65/150
1022/1022 [==============================] - 0s 115us/step - loss: 38919952404.0391
Epoch 66/150
1022/1022 [==============================] - 0s 121us/step - loss: 38919485283.6947
Epoch 67/150
1022/1022 [==============================] - 0s 139us/step - loss: 38919016111.3425
Epoch 68/150
1022/1022 [==============================] - 0s 128us/step - loss: 38918539676.8063
Epoch 69/150
1022/1022 [==============================] - ETA: 0s - loss: 38476961319.384 - 0s 145us/step - loss: 38918067819.2094
Epoch 70/150
1022/1022 [==============================] - 0s 104us/step - loss: 38917594639.0294
Epoch 71/150
1022/1022 [==============================] - 0s 71us/step - loss: 38917127137.9413
Epoch 72/150
1022/1022 [==============================] - 0s 79us/step - loss: 38916659083.7730
Epoch 73/150
1022/1022 [==============================] - 0s 75us/step - loss: 38916187659.0215
Epoch 74/150
1022/1022 [==============================] - 0s 69us/step - loss: 38915717332.4149
Epoch 75/150
1022/1022 [==============================] - 0s 65us/step - loss: 38915245623.1076
Epoch 76/150
1022/1022 [==============================] - 0s 64us/step - loss: 38914777031.8904
Epoch 77/150
1022/1022 [==============================] - 0s 63us/step - loss: 38914304348.6810
Epoch 78/150
1022/1022 [==============================] - 0s 65us/step - loss: 38913836699.3033
Epoch 79/150
1022/1022 [==============================] - 0s 61us/step - loss: 38913364400.8454
Epoch 80/150
1022/1022 [==============================] - 0s 62us/step - loss: 38912893204.5401
Epoch 81/150
1022/1022 [==============================] - 0s 68us/step - loss: 38912422236.6810
Epoch 82/150
1022/1022 [==============================] - 0s 64us/step - loss: 38911953693.5577
Epoch 83/150
1022/1022 [==============================] - 0s 67us/step - loss: 38911478044.5558
Epoch 84/150
1022/1022 [==============================] - 0s 65us/step - loss: 38911008928.3131
Epoch 85/150
1022/1022 [==============================] - 0s 63us/step - loss: 38910541519.4051
Epoch 86/150
1022/1022 [==============================] - 0s 65us/step - loss: 38910071152.7202
Epoch 87/150
1022/1022 [==============================] - 0s 68us/step - loss: 38909592698.2387
Epoch 88/150
1022/1022 [==============================] - 0s 63us/step - loss: 38909126179.0685
Epoch 89/150
1022/1022 [==============================] - 0s 65us/step - loss: 38908655884.5245
Epoch 90/150
1022/1022 [==============================] - 0s 68us/step - loss: 38908182511.9687
Epoch 91/150
1022/1022 [==============================] - 0s 62us/step - loss: 38907713459.8513
Epoch 92/150
1022/1022 [==============================] - 0s 64us/step - loss: 38907244191.3112
Epoch 93/150
1022/1022 [==============================] - 0s 65us/step - loss: 38906776886.6067
Epoch 94/150
1022/1022 [==============================] - 0s 64us/step - loss: 38906303602.2231
Epoch 95/150
1022/1022 [==============================] - 0s 66us/step - loss: 38905834966.9198
Epoch 96/150
1022/1022 [==============================] - 0s 65us/step - loss: 38905361722.6145
Epoch 97/150
1022/1022 [==============================] - 0s 42us/step - loss: 38904891412.0391
Epoch 98/150
1022/1022 [==============================] - 0s 34us/step - loss: 38904422840.8611
Epoch 99/150
1022/1022 [==============================] - 0s 36us/step - loss: 38903954566.2622
Epoch 100/150
1022/1022 [==============================] - 0s 49us/step - loss: 38903485666.4423
Epoch 101/150
1022/1022 [==============================] - 0s 66us/step - loss: 38903014397.9961
Epoch 102/150
1022/1022 [==============================] - 0s 68us/step - loss: 38902547193.4873
Epoch 103/150
1022/1022 [==============================] - 0s 68us/step - loss: 38902074333.9335
Epoch 104/150
1022/1022 [==============================] - 0s 65us/step - loss: 38901601746.9119
Epoch 105/150
1022/1022 [==============================] - 0s 64us/step - loss: 38901134750.8102
Epoch 106/150
1022/1022 [==============================] - 0s 65us/step - loss: 38900659698.9746
Epoch 107/150
1022/1022 [==============================] - 0s 62us/step - loss: 38900187865.4247
Epoch 108/150
1022/1022 [==============================] - 0s 71us/step - loss: 38899720721.0333
Epoch 109/150
1022/1022 [==============================] - 0s 67us/step - loss: 38899249524.7280
Epoch 110/150
1022/1022 [==============================] - 0s 64us/step - loss: 38898777454.7162
Epoch 111/150
1022/1022 [==============================] - 0s 69us/step - loss: 38898310150.0117
Epoch 112/150
1022/1022 [==============================] - 0s 69us/step - loss: 38897836921.7378
Epoch 113/150
1022/1022 [==============================] - 0s 67us/step - loss: 38897367448.7984
Epoch 114/150
1022/1022 [==============================] - 0s 39us/step - loss: 38896895298.6301
Epoch 115/150
1022/1022 [==============================] - 0s 47us/step - loss: 38896427416.7984
Epoch 116/150
1022/1022 [==============================] - 0s 60us/step - loss: 38895957619.2250
Epoch 117/150
1022/1022 [==============================] - 0s 62us/step - loss: 38895486386.8493
Epoch 118/150
1022/1022 [==============================] - 0s 65us/step - loss: 38895017154.3796
Epoch 119/150
1022/1022 [==============================] - 0s 70us/step - loss: 38894545220.6341
Epoch 120/150
1022/1022 [==============================] - 0s 68us/step - loss: 38894076917.9804
Epoch 121/150
1022/1022 [==============================] - 0s 70us/step - loss: 38893607437.0254
Epoch 122/150
1022/1022 [==============================] - 0s 59us/step - loss: 38893137831.8278
Epoch 123/150
1022/1022 [==============================] - 0s 71us/step - loss: 38892668459.0841
Epoch 124/150
1022/1022 [==============================] - 0s 74us/step - loss: 38892196096.5010
Epoch 125/150
1022/1022 [==============================] - 0s 66us/step - loss: 38891727821.9022
Epoch 126/150
1022/1022 [==============================] - 0s 76us/step - loss: 38891260501.1663
Epoch 127/150
1022/1022 [==============================] - 0s 77us/step - loss: 38890792571.2407
Epoch 128/150
1022/1022 [==============================] - 0s 68us/step - loss: 38890321996.1487
Epoch 129/150
1022/1022 [==============================] - 0s 45us/step - loss: 38889853160.4540
Epoch 130/150
1022/1022 [==============================] - 0s 37us/step - loss: 38889383659.4599
Epoch 131/150
1022/1022 [==============================] - 0s 38us/step - loss: 38888916731.4912
Epoch 132/150
1022/1022 [==============================] - 0s 45us/step - loss: 38888446685.4325
Epoch 133/150
1022/1022 [==============================] - 0s 75us/step - loss: 38887980827.5538
Epoch 134/150
1022/1022 [==============================] - 0s 62us/step - loss: 38887511867.6164
Epoch 135/150
1022/1022 [==============================] - 0s 66us/step - loss: 38887042005.9178
Epoch 136/150
1022/1022 [==============================] - 0s 72us/step - loss: 38886571346.6614
Epoch 137/150
1022/1022 [==============================] - 0s 72us/step - loss: 38886098130.4110
Epoch 138/150
1022/1022 [==============================] - 0s 69us/step - loss: 38885629767.6399
Epoch 139/150
1022/1022 [==============================] - 0s 66us/step - loss: 38885158342.8885
Epoch 140/150
1022/1022 [==============================] - 0s 65us/step - loss: 38884692785.5969
Epoch 141/150
1022/1022 [==============================] - 0s 67us/step - loss: 38884222058.2074
Epoch 142/150
1022/1022 [==============================] - 0s 69us/step - loss: 38883749519.2798
Epoch 143/150
1022/1022 [==============================] - 0s 67us/step - loss: 38883275232.9393
Epoch 144/150
1022/1022 [==============================] - 0s 63us/step - loss: 38882804938.3953
Epoch 145/150
1022/1022 [==============================] - 0s 63us/step - loss: 38882336046.5910
Epoch 146/150
1022/1022 [==============================] - 0s 42us/step - loss: 38881863659.9609
Epoch 147/150
1022/1022 [==============================] - 0s 52us/step - loss: 38881392503.7339
Epoch 148/150
1022/1022 [==============================] - 0s 57us/step - loss: 38880921403.6164
Epoch 149/150
1022/1022 [==============================] - 0s 64us/step - loss: 38880450804.4775
Epoch 150/150
1022/1022 [==============================] - 0s 61us/step - loss: 38879982473.7691
Out[562]:
<keras.callbacks.callbacks.History at 0x209042bbac8>
In [563]:
## Get the predictions on train and validation.
pred_train = model.predict(train_data_final)
pred_test = model.predict(test_data_final)
In [ ]:
## Get predictions on test data.
test_pred = model.predict(test_data_combine)
In [564]:
## Display RMSE value for train and validation.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 197179.45735965777
Test Error: 197930.06369128206
In [566]:
## Instantiate sequential model.
model1 = Sequential()

## Add 2 dense layes to model.
model1.add(Dense(8, input_dim=train_data_final.shape[1], activation='relu', kernel_initializer='uniform'))
model1.add(Dense(1, kernel_initializer='uniform'))

## Add compiler to model.
model.compile(loss='mse', optimizer='rmsprop')

## Fit a model.
model.fit(train_data_final, y_train, epochs=150, batch_size=32)
Epoch 1/150
1022/1022 [==============================] - 0s 101us/step - loss: 38879393034.5205
Epoch 2/150
1022/1022 [==============================] - 0s 34us/step - loss: 38878900817.1585
Epoch 3/150
1022/1022 [==============================] - 0s 34us/step - loss: 38878430346.2701
Epoch 4/150
1022/1022 [==============================] - 0s 32us/step - loss: 38877960348.3053
Epoch 5/150
1022/1022 [==============================] - 0s 35us/step - loss: 38877490029.7143
Epoch 6/150
1022/1022 [==============================] - 0s 32us/step - loss: 38877020520.7045
Epoch 7/150
1022/1022 [==============================] - 0s 40us/step - loss: 38876549120.0000
Epoch 8/150
1022/1022 [==============================] - ETA: 0s - loss: 53787824128.000 - 0s 34us/step - loss: 38876078805.4168
Epoch 9/150
1022/1022 [==============================] - 0s 32us/step - loss: 38875613256.1409
Epoch 10/150
1022/1022 [==============================] - 0s 35us/step - loss: 38875140568.9237
Epoch 11/150
1022/1022 [==============================] - 0s 37us/step - loss: 38874668246.4188
Epoch 12/150
1022/1022 [==============================] - 0s 37us/step - loss: 38874196156.3679
Epoch 13/150
1022/1022 [==============================] - 0s 44us/step - loss: 38873722928.0939
Epoch 14/150
1022/1022 [==============================] - 0s 39us/step - loss: 38873252593.4716
Epoch 15/150
1022/1022 [==============================] - 0s 46us/step - loss: 38872781441.2524
Epoch 16/150
1022/1022 [==============================] - 0s 45us/step - loss: 38872309158.8258
Epoch 17/150
1022/1022 [==============================] - 0s 45us/step - loss: 38871834415.5930
Epoch 18/150
1022/1022 [==============================] - 0s 54us/step - loss: 38871366469.6360
Epoch 19/150
1022/1022 [==============================] - 0s 61us/step - loss: 38870895597.9648
Epoch 20/150
1022/1022 [==============================] - 0s 64us/step - loss: 38870424926.6849
Epoch 21/150
1022/1022 [==============================] - 0s 62us/step - loss: 38869953277.4951
Epoch 22/150
1022/1022 [==============================] - 0s 63us/step - loss: 38869486557.9335
Epoch 23/150
1022/1022 [==============================] - 0s 67us/step - loss: 38869017337.4873
Epoch 24/150
1022/1022 [==============================] - 0s 32us/step - loss: 38868544582.1370
Epoch 25/150
1022/1022 [==============================] - 0s 31us/step - loss: 38868074804.6027
Epoch 26/150
1022/1022 [==============================] - 0s 39us/step - loss: 38867604017.0959
Epoch 27/150
1022/1022 [==============================] - 0s 45us/step - loss: 38867137850.6145
Epoch 28/150
1022/1022 [==============================] - 0s 62us/step - loss: 38866666229.4795
Epoch 29/150
1022/1022 [==============================] - 0s 63us/step - loss: 38866194840.7984
Epoch 30/150
1022/1022 [==============================] - 0s 65us/step - loss: 38865725179.4912
Epoch 31/150
1022/1022 [==============================] - 0s 61us/step - loss: 38865252632.5479
Epoch 32/150
1022/1022 [==============================] - 0s 63us/step - loss: 38864782790.8885
Epoch 33/150
1022/1022 [==============================] - 0s 61us/step - loss: 38864315578.3640
Epoch 34/150
1022/1022 [==============================] - 0s 61us/step - loss: 38863844826.9276
Epoch 35/150
1022/1022 [==============================] - 0s 59us/step - loss: 38863375366.0117
Epoch 36/150
1022/1022 [==============================] - 0s 58us/step - loss: 38862903973.3229
Epoch 37/150
1022/1022 [==============================] - 0s 50us/step - loss: 38862436492.2740
Epoch 38/150
1022/1022 [==============================] - 0s 40us/step - loss: 38861967436.1487
Epoch 39/150
1022/1022 [==============================] - 0s 49us/step - loss: 38861495686.7632
Epoch 40/150
1022/1022 [==============================] - 0s 59us/step - loss: 38861025031.5147
Epoch 41/150
1022/1022 [==============================] - 0s 64us/step - loss: 38860555013.5108
Epoch 42/150
1022/1022 [==============================] - 0s 49us/step - loss: 38860082999.6086
Epoch 43/150
1022/1022 [==============================] - 0s 33us/step - loss: 38859612272.2192
Epoch 44/150
1022/1022 [==============================] - 0s 36us/step - loss: 38859143396.4462
Epoch 45/150
1022/1022 [==============================] - 0s 38us/step - loss: 38858674424.4853
Epoch 46/150
1022/1022 [==============================] - 0s 54us/step - loss: 38858204354.3796
Epoch 47/150
1022/1022 [==============================] - 0s 67us/step - loss: 38857734532.7593
Epoch 48/150
1022/1022 [==============================] - 0s 67us/step - loss: 38857264013.7769
Epoch 49/150
1022/1022 [==============================] - 0s 43us/step - loss: 38856790400.7515
Epoch 50/150
1022/1022 [==============================] - 0s 40us/step - loss: 38856322438.7632
Epoch 51/150
1022/1022 [==============================] - 0s 64us/step - loss: 38855850036.1018
Epoch 52/150
1022/1022 [==============================] - 0s 84us/step - loss: 38855380947.9139
Epoch 53/150
1022/1022 [==============================] - 0s 89us/step - loss: 38854905615.5303
Epoch 54/150
1022/1022 [==============================] - 0s 84us/step - loss: 38854435296.9393
Epoch 55/150
1022/1022 [==============================] - 0s 82us/step - loss: 38853971006.1213
Epoch 56/150
1022/1022 [==============================] - 0s 86us/step - loss: 38853499332.8845
Epoch 57/150
1022/1022 [==============================] - 0s 80us/step - loss: 38853030064.3444
Epoch 58/150
1022/1022 [==============================] - 0s 50us/step - loss: 38852561645.4638
Epoch 59/150
1022/1022 [==============================] - 0s 30us/step - loss: 38852092801.7534
Epoch 60/150
1022/1022 [==============================] - 0s 42us/step - loss: 38851620162.6301
Epoch 61/150
1022/1022 [==============================] - 0s 62us/step - loss: 38851149595.5538
Epoch 62/150
1022/1022 [==============================] - 0s 62us/step - loss: 38850680667.6791
Epoch 63/150
1022/1022 [==============================] - 0s 65us/step - loss: 38850207335.2016
Epoch 64/150
1022/1022 [==============================] - 0s 69us/step - loss: 38849736367.3425
Epoch 65/150
1022/1022 [==============================] - 0s 64us/step - loss: 38849262698.2074
Epoch 66/150
1022/1022 [==============================] - 0s 66us/step - loss: 38848793261.3386
Epoch 67/150
1022/1022 [==============================] - 0s 61us/step - loss: 38848324473.7378
Epoch 68/150
1022/1022 [==============================] - 0s 35us/step - loss: 38847850956.9002
Epoch 69/150
1022/1022 [==============================] - 0s 73us/step - loss: 38847385896.5793
Epoch 70/150
1022/1022 [==============================] - 0s 75us/step - loss: 38846918183.0763
Epoch 71/150
1022/1022 [==============================] - 0s 77us/step - loss: 38846447355.4912
Epoch 72/150
1022/1022 [==============================] - 0s 81us/step - loss: 38845979124.9785
Epoch 73/150
1022/1022 [==============================] - 0s 76us/step - loss: 38845504710.3875
Epoch 74/150
1022/1022 [==============================] - 0s 60us/step - loss: 38845038696.2035
Epoch 75/150
1022/1022 [==============================] - 0s 32us/step - loss: 38844569908.6027
Epoch 76/150
1022/1022 [==============================] - 0s 50us/step - loss: 38844095790.5910
Epoch 77/150
1022/1022 [==============================] - 0s 62us/step - loss: 38843621504.2505
Epoch 78/150
1022/1022 [==============================] - 0s 62us/step - loss: 38843153365.9178
Epoch 79/150
1022/1022 [==============================] - 0s 69us/step - loss: 38842683680.5636
Epoch 80/150
1022/1022 [==============================] - 0s 67us/step - loss: 38842210243.8826
Epoch 81/150
1022/1022 [==============================] - 0s 71us/step - loss: 38841743267.8200
Epoch 82/150
1022/1022 [==============================] - 0s 34us/step - loss: 38841270031.5303
Epoch 83/150
1022/1022 [==============================] - 0s 51us/step - loss: 38840799360.2505
Epoch 84/150
1022/1022 [==============================] - 0s 63us/step - loss: 38840329474.5049
Epoch 85/150
1022/1022 [==============================] - 0s 66us/step - loss: 38839858915.4442
Epoch 86/150
1022/1022 [==============================] - 0s 69us/step - loss: 38839388532.7280
Epoch 87/150
1022/1022 [==============================] - 0s 63us/step - loss: 38838917364.4775
Epoch 88/150
1022/1022 [==============================] - 0s 69us/step - loss: 38838450797.2133
Epoch 89/150
1022/1022 [==============================] - 0s 66us/step - loss: 38837976238.3405
Epoch 90/150
1022/1022 [==============================] - 0s 66us/step - loss: 38837507106.0665
Epoch 91/150
1022/1022 [==============================] - 0s 48us/step - loss: 38837029918.0587
Epoch 92/150
1022/1022 [==============================] - 0s 31us/step - loss: 38836562589.3072
Epoch 93/150
1022/1022 [==============================] - 0s 34us/step - loss: 38836093553.2211
Epoch 94/150
1022/1022 [==============================] - 0s 39us/step - loss: 38835620733.7456
Epoch 95/150
1022/1022 [==============================] - 0s 56us/step - loss: 38835155320.7358
Epoch 96/150
1022/1022 [==============================] - 0s 67us/step - loss: 38834682421.1037
Epoch 97/150
1022/1022 [==============================] - 0s 66us/step - loss: 38834213946.1135
Epoch 98/150
1022/1022 [==============================] - 0s 78us/step - loss: 38833745302.7945
Epoch 99/150
1022/1022 [==============================] - 0s 67us/step - loss: 38833277949.9961
Epoch 100/150
1022/1022 [==============================] - 0s 66us/step - loss: 38832806332.8689
Epoch 101/150
1022/1022 [==============================] - 0s 53us/step - loss: 38832337501.1820
Epoch 102/150
1022/1022 [==============================] - 0s 40us/step - loss: 38831865956.1957
Epoch 103/150
1022/1022 [==============================] - 0s 58us/step - loss: 38831397685.6047
Epoch 104/150
1022/1022 [==============================] - 0s 78us/step - loss: 38830925483.3346
Epoch 105/150
1022/1022 [==============================] - 0s 87us/step - loss: 38830453982.4344
Epoch 106/150
1022/1022 [==============================] - 0s 86us/step - loss: 38829983651.8200
Epoch 107/150
1022/1022 [==============================] - 0s 81us/step - loss: 38829512884.3523
Epoch 108/150
1022/1022 [==============================] - 0s 31us/step - loss: 38829046777.9883
Epoch 109/150
1022/1022 [==============================] - 0s 33us/step - loss: 38828575269.0724
Epoch 110/150
1022/1022 [==============================] - 0s 35us/step - loss: 38828102978.6301
Epoch 111/150
1022/1022 [==============================] - 0s 51us/step - loss: 38827629694.2466
Epoch 112/150
1022/1022 [==============================] - 0s 72us/step - loss: 38827161531.8669
Epoch 113/150
1022/1022 [==============================] - 0s 70us/step - loss: 38826688984.9237
Epoch 114/150
1022/1022 [==============================] - 0s 69us/step - loss: 38826218225.4716
Epoch 115/150
1022/1022 [==============================] - 0s 65us/step - loss: 38825744219.6791
Epoch 116/150
1022/1022 [==============================] - 0s 71us/step - loss: 38825274935.1076
Epoch 117/150
1022/1022 [==============================] - 0s 66us/step - loss: 38824804792.8611
Epoch 118/150
1022/1022 [==============================] - 0s 79us/step - loss: 38824336971.1468
Epoch 119/150
1022/1022 [==============================] - 0s 82us/step - loss: 38823862913.2524
Epoch 120/150
1022/1022 [==============================] - 0s 80us/step - loss: 38823389833.2681
Epoch 121/150
1022/1022 [==============================] - 0s 77us/step - loss: 38822922268.0548
Epoch 122/150
1022/1022 [==============================] - 0s 72us/step - loss: 38822453961.3933
Epoch 123/150
1022/1022 [==============================] - 0s 73us/step - loss: 38821983402.3327
Epoch 124/150
1022/1022 [==============================] - 0s 75us/step - loss: 38821516690.7867
Epoch 125/150
1022/1022 [==============================] - 0s 74us/step - loss: 38821048784.9080
Epoch 126/150
1022/1022 [==============================] - 0s 70us/step - loss: 38820574662.8885
Epoch 127/150
1022/1022 [==============================] - 0s 79us/step - loss: 38820103218.0978
Epoch 128/150
1022/1022 [==============================] - 0s 71us/step - loss: 38819632410.5519
Epoch 129/150
1022/1022 [==============================] - 0s 69us/step - loss: 38819166131.8513
Epoch 130/150
1022/1022 [==============================] - 0s 67us/step - loss: 38818695709.0568
Epoch 131/150
1022/1022 [==============================] - 0s 59us/step - loss: 38818223089.9726
Epoch 132/150
1022/1022 [==============================] - 0s 83us/step - loss: 38817755961.6125
Epoch 133/150
1022/1022 [==============================] - 0s 79us/step - loss: 38817283342.5284
Epoch 134/150
1022/1022 [==============================] - 0s 81us/step - loss: 38816815845.4481
Epoch 135/150
1022/1022 [==============================] - 0s 79us/step - loss: 38816346368.5010
Epoch 136/150
1022/1022 [==============================] - 0s 80us/step - loss: 38815873096.1409
Epoch 137/150
1022/1022 [==============================] - 0s 37us/step - loss: 38815397924.0704
Epoch 138/150
1022/1022 [==============================] - 0s 32us/step - loss: 38814930835.7886
Epoch 139/150
1022/1022 [==============================] - 0s 38us/step - loss: 38814460196.5714
Epoch 140/150
1022/1022 [==============================] - 0s 37us/step - loss: 38813990374.9511
Epoch 141/150
1022/1022 [==============================] - 0s 48us/step - loss: 38813525507.0059
Epoch 142/150
1022/1022 [==============================] - 0s 60us/step - loss: 38813056795.5538
Epoch 143/150
1022/1022 [==============================] - 0s 64us/step - loss: 38812583567.2798
Epoch 144/150
1022/1022 [==============================] - 0s 68us/step - loss: 38812111681.6282
Epoch 145/150
1022/1022 [==============================] - 0s 68us/step - loss: 38811647018.0822
Epoch 146/150
1022/1022 [==============================] - 0s 66us/step - loss: 38811179933.8082
Epoch 147/150
1022/1022 [==============================] - 0s 63us/step - loss: 38810707531.1468
Epoch 148/150
1022/1022 [==============================] - 0s 88us/step - loss: 38810237477.0724
Epoch 149/150
1022/1022 [==============================] - 0s 87us/step - loss: 38809761735.8904
Epoch 150/150
1022/1022 [==============================] - 0s 90us/step - loss: 38809294607.5303
Out[566]:
<keras.callbacks.callbacks.History at 0x20913acf668>
In [568]:
## Get the predictions on train and validation.
pred_train = model1.predict(train_data_final)
pred_test = model1.predict(test_data_final)
In [ ]:
## Get predictions on test data.
test_pred = model1.predict(test_data_combine)
In [569]:
## Display RMSE value for train and vallidation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 197358.3163285996
Test Error: 198108.21506572075
In [570]:
############################################## PCA ###########################################################################
In [571]:
## Import PCA model library.
from sklearn.decomposition import PCA
In [607]:
## Instantiate PCA model and fit it.
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(X_train_scaler)
In [608]:
## Get dimensions of train data.
train_data_final.shape
Out[608]:
(1022, 277)
In [609]:
## Get dimensions of pca coponnents.
principalComponents.shape
Out[609]:
(1022, 2)
In [610]:
## Display  principal components.
print(principalComponents)
[[-0.8954133  -2.18631755]
 [-2.66539918  0.43482128]
 [-2.09868063 -0.79797737]
 ...
 [ 2.48650134  0.28487654]
 [-1.46838751 -2.10796887]
 [-1.58717077 -0.00724692]]
In [611]:
## Prepare a dataframe with principal component.
principalDf = pd.DataFrame(data = principalComponents
             , columns = ['principal component 1', 'principal component 2'])
In [612]:
## Get first  5 records of principal component.
principalDf.head()
Out[612]:
principal component 1 principal component 2
0 -0.895413 -2.186318
1 -2.665399 0.434821
2 -2.098681 -0.797977
3 -0.056531 -0.031794
4 0.845449 -2.374163
In [613]:
## Get  varience ratio.
pca.explained_variance_ratio_
Out[613]:
array([0.21192618, 0.10387824])
In [614]:
## Get components.
pca.components_
Out[614]:
array([[ 0.21954716,  0.13919059,  0.25063762,  0.21109311,  0.22612689,
         0.18639512,  0.00789492,  0.1209049 ,  0.31321585,  0.31675222,
         0.11520559, -0.01314929,  0.32642755,  0.12633061,  0.26510875,
         0.22884204,  0.31064052,  0.3227923 ,  0.139032  ,  0.16942426,
        -0.08594562,  0.02502759,  0.0376397 ,  0.09271425,  0.0159143 ,
         0.01729143, -0.01169429],
       [ 0.12553739,  0.09021614, -0.32818343, -0.20498363, -0.02720514,
        -0.13671027,  0.0109314 ,  0.03405799, -0.10444001, -0.03336502,
         0.39664523,  0.18821754,  0.31354295,  0.41962082,  0.37018061,
        -0.28723628, -0.11852569, -0.11385579, -0.05821965,  0.03671219,
         0.19723035, -0.02522341,  0.06950059,  0.14015004,  0.09141491,
         0.03752889, -0.0379974 ]])
In [615]:
## Get features value.
pca.n_features_
Out[615]:
27
In [621]:
## Get the predictions on train and validation data.
train_pca = pca.transform(X_train_scaler)
test_pca = pca.transform(X_test_scaler)
In [623]:
## Display train prediction vallues.
train_pca
Out[623]:
array([[-0.89541298, -2.18634211],
       [-2.66540047,  0.435051  ],
       [-2.09868065, -0.79802406],
       ...,
       [ 2.48650239,  0.28477943],
       [-1.46838756, -2.10795479],
       [-1.58717065, -0.00720368]])
In [624]:
## Display validation prediction vallues.
test_pca
Out[624]:
array([[ 1.15631213e+00,  5.95098354e-01],
       [-4.51084517e-01,  3.07914898e+00],
       [ 1.70842552e-01, -3.86034495e-01],
       [ 2.13042556e+00,  3.09347366e-01],
       [-1.92999453e+00,  9.29849697e-01],
       [ 3.50776552e+00, -1.06220920e+00],
       [ 3.46897325e+00,  1.70532045e+00],
       [-2.66206957e+00, -4.68360625e-01],
       [-4.79949640e-01, -1.35964165e-02],
       [-1.51951014e+00, -6.16007414e-01],
       [ 5.75927321e-01, -1.68190468e+00],
       [ 1.21260292e+00,  5.76601800e-01],
       [-8.75171999e-01, -2.57685380e+00],
       [-1.87794288e+00, -1.71568191e-01],
       [ 2.61642739e+00,  2.37553244e+00],
       [ 4.52825404e-01, -7.90378504e-01],
       [ 2.44573639e-02,  1.06509568e+00],
       [ 4.23283242e+00, -7.68756629e-01],
       [ 1.06689672e+00, -2.00248321e-01],
       [ 5.81676000e-01,  1.59294388e+00],
       [-8.95318581e-01, -5.63821750e-01],
       [ 2.50095525e-01,  1.83548069e+00],
       [-2.81000784e+00, -4.06339451e-01],
       [-1.96392923e+00,  1.30764016e+00],
       [ 9.01045732e-01, -3.34261117e-01],
       [ 1.08580841e+00, -1.18578419e+00],
       [ 1.69796945e+00, -1.31013720e+00],
       [ 6.85113862e-01,  1.64829298e-01],
       [-1.64203118e+00, -5.60084235e-01],
       [-1.34744183e-01, -8.43595522e-01],
       [-9.06787824e-01, -3.06272811e-01],
       [ 1.53601651e+00, -1.14929645e+00],
       [-7.87317929e-01, -3.41414787e-01],
       [ 1.60495797e+00, -1.79014071e+00],
       [ 4.15783199e+00, -2.97299661e+00],
       [ 1.59888023e+00, -1.53721895e+00],
       [ 1.00808006e-01, -1.30452524e+00],
       [ 4.44819933e+00,  7.30975737e+00],
       [ 6.28605006e-01, -5.52121223e-01],
       [-3.24248763e+00,  1.56586965e+00],
       [-1.02348712e+00, -1.87929718e+00],
       [ 1.81907364e+00, -2.16475571e+00],
       [ 1.42281589e+00,  6.67058593e-01],
       [ 1.09169381e+00, -2.24876358e+00],
       [ 1.73841080e+00, -7.87096855e-01],
       [-2.85085831e+00, -7.82668820e-01],
       [ 4.02556212e+00, -1.22034661e+00],
       [ 1.10099290e+00,  1.82921313e-01],
       [ 1.94271868e+00,  3.46748783e-02],
       [-2.88007069e+00,  1.87603014e+00],
       [ 9.72037371e-01,  1.55665603e+00],
       [ 5.12336737e-01, -3.36790312e+00],
       [-1.01291833e+00, -6.59773884e-01],
       [ 1.57729074e-01, -4.19008707e-01],
       [-3.19854879e+00,  1.48457420e+00],
       [-8.77609394e-01,  8.79886358e-01],
       [-2.24898705e+00, -9.51207750e-01],
       [ 6.65703150e-01, -9.67564626e-01],
       [ 2.37442540e+00, -1.30639424e+00],
       [ 4.16994292e+00, -1.17350171e+00],
       [-1.51862276e+00, -1.42411530e+00],
       [ 3.64927708e+00,  1.32611174e+00],
       [-1.77630674e+00,  4.09452419e+00],
       [-3.56729582e-01, -1.02624538e+00],
       [ 9.34371362e-01,  2.03568283e-01],
       [-2.61016086e+00,  3.30155479e+00],
       [-3.47932428e+00,  1.32291856e+00],
       [ 3.04321680e+00, -1.44706253e+00],
       [ 2.13492487e+00, -1.12013206e+00],
       [-6.07508391e-01, -5.44293038e-01],
       [-1.06487187e+00, -1.85578362e+00],
       [ 1.48250602e+00,  1.55901104e+00],
       [-8.77895129e-01, -1.84138137e+00],
       [ 3.24162650e+00, -1.30819064e+00],
       [-5.78243624e-01,  2.48765908e+00],
       [-2.37609723e+00, -2.81152823e-01],
       [ 2.47215018e-01, -3.34708978e-01],
       [-1.20025313e+00, -1.55412327e+00],
       [ 1.27032515e+00,  9.64662650e-01],
       [-2.09795618e+00, -3.11555082e+00],
       [ 3.48267560e+00, -4.77482269e-01],
       [ 4.54415875e+00, -1.99007660e+00],
       [-4.52901423e+00, -2.99780505e-01],
       [-1.98888380e+00,  3.71989732e-02],
       [ 3.40050573e+00, -1.16422339e+00],
       [-1.13748454e+00, -2.12438847e-01],
       [-3.99488094e+00,  2.62272427e-02],
       [-2.58950513e+00,  4.43930219e-01],
       [-3.43592026e+00, -2.14991395e-02],
       [ 3.65538290e+00, -9.50462573e-01],
       [-1.75605494e+00,  7.93575681e-01],
       [ 5.99155256e-01,  1.47865977e-01],
       [ 2.20900346e-01,  1.52401824e-01],
       [-1.13667723e+00,  1.84341028e+00],
       [-7.47092238e-02, -1.70063443e+00],
       [-2.80871136e+00,  1.81525865e+00],
       [-5.47329247e-01,  5.02436921e+00],
       [-2.74146997e+00, -7.39738863e-01],
       [ 1.88755107e+00,  8.35725082e-01],
       [-2.40078350e+00, -8.18529469e-01],
       [-2.43017051e+00,  2.25149528e+00],
       [ 6.50875179e+00,  2.01779985e+00],
       [ 4.87777177e+00,  1.66832880e+00],
       [-3.20884589e-01, -7.67974751e-01],
       [ 1.01775823e+00, -4.14753660e-01],
       [-2.46413973e+00,  2.78457874e+00],
       [-2.44507526e+00, -8.29428594e-01],
       [ 6.02141520e-01, -1.28073036e+00],
       [-1.32772651e-01, -7.45960483e-01],
       [ 3.71228678e+00, -1.68469832e+00],
       [ 3.86735400e+00,  5.84810269e-01],
       [ 3.63273998e+00,  2.69770492e+00],
       [-3.06440692e+00,  1.18430859e+00],
       [ 2.99269328e+00,  4.36819191e-03],
       [ 1.00316581e+00, -2.06797775e+00],
       [-2.65633465e+00, -2.74943403e+00],
       [ 2.75284142e-01, -1.21256911e+00],
       [ 5.34729684e+00,  5.34867220e-01],
       [-2.69857067e+00, -3.05516168e-01],
       [-1.19555198e+00, -1.56606747e+00],
       [-5.46867551e-02,  7.10952688e-01],
       [ 1.13374286e+00, -1.09704821e+00],
       [ 9.31381639e-01, -2.37800815e+00],
       [-1.77274064e+00, -1.20908771e+00],
       [ 1.07545601e+00, -3.30169745e+00],
       [ 3.18039178e+00, -1.72598486e+00],
       [-3.20140029e+00, -3.66761268e-01],
       [-2.45265046e+00, -1.09522143e+00],
       [-6.09623831e-01, -2.16774787e+00],
       [ 4.19629362e+00,  2.88844373e-01],
       [-8.13024776e-01,  6.74135979e-01],
       [ 3.21792102e+00, -2.17157363e+00],
       [ 2.75397374e+00,  1.55373289e+00],
       [ 3.99965190e-01,  6.03441390e-03],
       [ 2.28821382e-01,  2.13210497e+00],
       [ 1.10983999e+00,  2.56796529e-01],
       [ 3.56531327e+00,  3.79703887e-01],
       [-3.59602464e+00, -3.22186393e-01],
       [-3.30368864e-01, -1.18964938e+00],
       [-2.51885304e+00, -7.99446935e-02],
       [-2.18952323e+00,  1.26858541e+00],
       [ 2.89395053e+00,  8.59902897e-02],
       [ 2.96455241e+00,  8.88974117e-01],
       [-8.05398308e-01, -8.89930682e-01],
       [ 1.89444828e+00, -1.01423695e+00],
       [ 3.89898789e+00,  2.74104382e+00],
       [ 3.40115446e+00, -1.46623603e+00],
       [-3.27639643e-01,  2.19188614e+00],
       [ 9.50302179e-01, -4.28467937e-01],
       [-1.42559930e+00,  1.16832970e+00],
       [-7.01947444e-02, -5.02355280e-01],
       [-1.65565543e+00, -4.30587568e-01],
       [ 5.73343019e+00,  1.50101437e+00],
       [-3.76911509e-01, -7.69376329e-01],
       [ 1.99402868e+00, -1.65433182e+00],
       [-4.71912528e-01, -2.69977341e+00],
       [ 2.37816740e+00, -9.53778230e-01],
       [-1.96728348e+00,  5.49805639e-01],
       [-1.23288318e+00, -6.49184944e-01],
       [-1.15765620e+00, -1.98417634e+00],
       [ 8.24692059e+00,  2.54665248e+00],
       [ 1.61960508e+00,  5.77531649e-01],
       [ 2.76451774e+00,  1.76829382e+00],
       [-1.30906083e+00,  1.29170342e+00],
       [ 3.13350683e+00, -1.94894326e+00],
       [-2.78432018e+00,  3.03008856e+00],
       [-8.45143471e-01,  1.77948228e-01],
       [-1.11518258e+00,  8.91119241e-02],
       [ 6.62175158e-01, -1.40530716e+00],
       [-2.02549035e+00, -2.35229402e-01],
       [-1.89636353e+00, -3.83971390e-01],
       [-1.04108906e+00,  1.55229552e+00],
       [ 5.92004667e-01,  2.18734841e+00],
       [ 1.79448034e+00, -3.04515717e+00],
       [-7.16705410e-01, -7.84753486e-01],
       [ 3.59264995e+00, -4.73598784e-02],
       [ 5.96369315e-01, -1.49682352e+00],
       [-2.47422575e+00, -1.20878025e-01],
       [ 8.28920506e-01, -2.25971769e+00],
       [ 3.26595400e+00, -1.37114206e+00],
       [-3.61628499e-01, -1.25934606e+00],
       [-9.57890683e-01, -2.60073403e+00],
       [-1.91596391e+00, -9.13735444e-01],
       [ 2.69398113e-01, -1.20988639e+00],
       [-2.53892300e-01,  1.89680009e-01],
       [-1.78548711e+00, -9.21353820e-01],
       [-1.78798239e+00,  2.72383778e+00],
       [ 5.51674606e-01, -2.15278069e+00],
       [ 2.54655732e-01, -6.17384611e-01],
       [-4.51148924e-01, -1.08104641e+00],
       [-2.21177015e+00,  7.22333887e-01],
       [-1.82278242e+00,  2.05847942e-02],
       [ 3.07282200e-01, -3.31547650e-01],
       [ 2.48129284e+00, -1.63706683e+00],
       [-1.57632369e+00,  9.79735574e-01],
       [ 3.69841974e+00,  1.51004555e+00],
       [ 1.58917532e-01, -1.82897503e+00],
       [-3.18222172e+00,  2.86739755e-01],
       [-5.16876840e-01,  2.25759619e-01],
       [-3.13378943e+00, -6.06768803e-01],
       [ 5.24577500e-01, -4.74949353e-02],
       [ 3.70357779e+00, -1.69558235e+00],
       [-1.58792535e-01, -2.45263684e-01],
       [ 4.36988029e+00,  8.60982568e-01],
       [ 1.11561923e+00,  1.52290917e+00],
       [-1.67257571e+00, -1.50422926e-01],
       [-3.38315000e+00, -1.09305734e+00],
       [ 3.85984599e+00,  2.33996049e+00],
       [-2.49747556e-01, -9.80131565e-01],
       [ 4.98126385e-01, -5.98488159e-01],
       [-1.81495567e+00,  2.30506695e+00],
       [ 4.61835667e+00, -1.13549778e+00],
       [-2.28001942e+00,  2.12275065e-01],
       [-1.37065892e+00,  1.16562343e-02],
       [-9.88149685e-01, -9.76676474e-01],
       [-4.07954919e-01, -2.65545210e+00],
       [ 6.03911808e-01, -1.02691513e-01],
       [-2.14000392e+00,  1.46799100e+00],
       [-1.85682269e+00, -8.08676803e-01],
       [-3.39123375e+00,  4.52357203e-01],
       [ 3.52922904e-01, -3.32939726e-01],
       [-4.33065915e+00,  9.91902488e-01],
       [ 2.45318213e+00,  8.56786518e-01],
       [-2.23650189e+00,  1.55487422e+00],
       [-2.10212555e+00,  2.74391594e-01],
       [-2.68177486e+00,  3.42908161e+00],
       [-6.50028179e-02, -2.73161535e-01],
       [-3.08916399e+00, -8.70875685e-01],
       [-5.91746812e-02, -2.76186359e-01],
       [-8.60957888e-01,  6.24809298e-01],
       [-2.21082142e+00,  2.72732530e+00],
       [-1.05770430e+00, -3.66215415e+00],
       [ 2.44139362e-01,  2.01278002e+00],
       [-2.43978829e+00,  3.07290990e-01],
       [-2.10650873e+00, -6.92506424e-01],
       [-3.52923973e-01,  1.65786901e-01],
       [-1.86436193e-01,  4.79214969e-01],
       [-2.02336840e+00, -1.40074254e-01],
       [ 4.63034023e-01, -6.64755824e-02],
       [-2.88942875e+00, -2.73719486e-02],
       [ 2.23561257e+00, -1.15629338e-01],
       [ 1.81718838e+00, -6.43812179e-02],
       [ 1.52547802e+00, -1.15088479e+00],
       [ 1.43279825e+00,  1.43535599e+00],
       [-1.55958622e+00, -2.79569080e-01],
       [ 2.44724811e+00,  6.25571158e-01],
       [-3.36875530e-01,  3.67176523e+00],
       [ 3.08693232e+00, -1.78589058e+00],
       [-2.22417657e+00, -1.64065183e+00],
       [ 2.32781047e+00, -3.53702853e-01],
       [ 3.07785389e+00, -1.50385306e+00],
       [-2.76609605e+00,  8.66910376e-01],
       [ 6.52566474e-01, -3.28009746e-01],
       [-3.00131293e+00,  1.45839820e+00],
       [ 3.41259025e-02, -1.28182135e-02],
       [ 1.18347353e+00,  4.16947196e-01],
       [ 2.43177958e+00, -2.28031866e+00],
       [ 9.58281856e-01, -2.00809415e+00],
       [-9.04530014e-01, -7.49861880e-01],
       [-1.81317435e+00, -3.58522055e-01],
       [ 1.50239647e+00, -2.57364583e-01],
       [ 6.98398027e-01, -2.08482531e+00],
       [ 2.43748533e-01, -1.38319311e+00],
       [ 7.35087282e-01,  1.85494518e+00],
       [ 3.15371584e-01, -8.10563902e-02],
       [-5.47791780e-01,  1.85838734e+00],
       [ 1.86755031e+00,  1.26761837e+00],
       [ 3.95647361e-01, -2.79847886e-01],
       [-4.35414283e+00,  6.25103249e-01],
       [-2.34452713e+00, -1.52555336e+00],
       [-2.06957052e+00,  2.40354676e+00],
       [-3.47317949e+00,  4.70451460e-02],
       [ 2.11344623e+00, -1.02177579e+00],
       [-3.98665127e+00, -1.25659392e+00],
       [ 2.94674612e+00, -1.74114614e-01],
       [ 2.17666614e-01,  3.28555479e-01],
       [ 1.38330917e-01, -2.08089075e+00],
       [ 1.79532948e+00,  1.54232738e+00],
       [-1.78835779e+00,  1.20101597e+00],
       [-2.32887859e+00, -1.55207317e-01],
       [-2.38841819e+00,  1.41330139e+00],
       [-6.34711976e-01, -4.79627156e-01],
       [-2.58020361e+00,  6.25139120e-01],
       [-9.37717720e-01, -4.62758415e-02],
       [-9.86985845e-02, -1.52024273e+00],
       [-2.42590389e+00,  1.27853060e+00],
       [ 5.55036983e-01,  5.72680917e-03],
       [-2.54401993e+00,  2.20388894e+00],
       [-2.51638707e-01,  1.63984745e+00],
       [-1.05303240e+00, -1.52146351e-01],
       [ 1.38250390e+00,  2.52129640e+00],
       [-3.46427081e+00,  9.20146922e-01],
       [ 3.71939571e+00,  1.80504033e+00],
       [ 5.99819545e-01, -1.85698585e-01],
       [ 4.62019370e+00,  2.48683196e+00],
       [-3.64409814e-01,  1.25353164e+00],
       [-1.69233074e-01, -1.80812471e+00],
       [ 1.96238569e+00, -1.16735314e+00],
       [-6.29577669e-02,  3.35455389e-01],
       [-1.53546477e+00, -2.41605843e-01],
       [ 4.30938340e+00, -2.35132954e+00],
       [-6.06903926e-01, -4.70377067e-01],
       [ 2.21661329e+00,  9.71933747e-02],
       [ 3.04282014e+00,  2.18979891e+00],
       [ 3.69014412e+00, -1.82136640e+00],
       [ 2.70559816e-01, -1.65938255e-01],
       [ 4.10394136e+00, -1.54186225e+00],
       [ 1.51559284e+00,  3.46707858e-01],
       [-8.04695267e-01, -2.24758468e-01],
       [-1.85885629e+00, -5.59831066e-01],
       [-5.16854087e-01,  4.33392141e-01],
       [ 1.13882246e+00,  2.17890716e+00],
       [-1.39428244e+00, -1.47420177e-01],
       [ 3.21042374e+00, -1.56765882e+00],
       [-1.25185050e+00,  3.46940652e+00],
       [ 2.24438129e+00, -1.28232515e+00],
       [ 3.08746154e+00, -1.39500902e+00],
       [-1.57387566e+00, -8.85380807e-01],
       [-1.39187114e+00, -3.63792156e+00],
       [ 7.92013667e-01,  4.89656935e-01],
       [ 3.50848030e+00, -2.00133849e+00],
       [ 1.33171351e+00, -6.16441186e-01],
       [ 4.95652861e-01, -4.82143161e-01],
       [ 6.82196734e-01,  6.40091568e-01],
       [ 2.43517043e+00,  2.08166961e+00],
       [-1.86339010e-01,  4.21944167e+00],
       [-1.26953837e+00,  2.22652917e-01],
       [-1.81933604e+00, -1.06303748e+00],
       [ 1.00215048e+00, -1.88726277e-01],
       [ 1.69471429e+00, -1.65805262e+00],
       [-5.96741407e-01, -4.87401971e-01],
       [-1.12219184e+00, -3.50511741e-01],
       [-1.92214393e+00,  1.98740755e+00],
       [ 9.89079473e-01, -2.52507760e+00],
       [-4.07818180e+00, -2.53975908e+00],
       [ 3.96706220e+00, -7.72726037e-01],
       [ 1.81558040e+00,  3.98626479e-01],
       [ 2.79866636e+00, -1.57846739e+00],
       [ 5.92221987e-03, -4.41519810e-01],
       [-9.17712349e-01, -3.95711611e-01],
       [-1.62761760e+00,  4.33091103e-02],
       [ 5.82477554e+00, -2.83478642e+00],
       [ 1.26104474e+00,  2.16293750e+00],
       [ 6.88904705e-02, -5.90563937e-01],
       [ 3.84324643e+00,  8.76573173e-02],
       [-1.52897659e+00, -1.04287007e+00],
       [-1.63584542e+00, -1.87040630e+00],
       [-1.57286381e+00, -4.07094360e-01],
       [ 4.08491227e-01,  1.44140746e+00],
       [-1.31881495e+00, -2.21994897e+00],
       [ 3.38472766e-02,  4.47504802e+00],
       [ 6.91831989e-01, -3.41069016e-01],
       [ 2.86225423e+00, -4.35290287e-01],
       [-1.40809999e-01, -1.40453530e-01],
       [ 2.59674308e+00, -1.79098849e+00],
       [-1.47732777e+00,  4.37659131e+00],
       [ 6.68796026e-01, -2.50680677e+00],
       [-1.01866194e+00, -2.02094156e+00],
       [-3.96294287e-01,  1.24949178e+00],
       [-1.99802979e+00, -1.52843057e+00],
       [-4.08560905e+00,  2.78063063e+00],
       [ 5.56780091e-01, -3.42487408e-01],
       [ 1.08087535e+00, -2.82120464e-01],
       [ 1.04948785e+00,  2.11237749e+00],
       [-1.46748145e+00, -1.04283228e+00],
       [-7.16890614e-01, -1.18109243e-01],
       [-3.29405618e+00, -1.15114457e+00],
       [ 2.32363722e-01, -1.31241236e+00],
       [ 2.09057768e-01, -2.73220820e+00],
       [ 1.76339172e+00, -1.51527341e+00],
       [ 1.67366165e+00,  1.17810767e+00],
       [-6.56474627e-02, -7.81406720e-01],
       [-4.75619311e+00,  2.14778204e-01],
       [-3.50282301e+00, -9.20896167e-01],
       [-7.29122761e-01,  7.24071637e-01],
       [ 3.50887188e-02, -2.84150837e+00],
       [-1.70156673e+00, -5.15768821e-01],
       [ 1.53363191e+00,  1.08950182e+00],
       [ 1.92909456e+00, -1.36937232e+00],
       [ 7.98320198e-01, -1.57284625e+00],
       [-1.20713101e+00,  2.51696638e+00],
       [ 1.59017572e+00, -6.60777853e-01],
       [ 7.54503398e-01, -1.89400649e+00],
       [-4.24469851e+00, -5.01354797e-01],
       [-9.73608007e-02,  2.13295477e+00],
       [-1.00284689e+00, -5.86631925e-01],
       [ 2.77778877e+00, -7.47727499e-01],
       [-1.42602324e+00, -1.04680294e+00],
       [-2.41288072e+00,  1.53198863e+00],
       [-3.30543565e+00,  3.99574493e-01],
       [ 8.09378577e-01, -2.33350508e+00],
       [ 7.93951689e-01,  3.91151228e+00],
       [-3.80398929e+00,  7.64371361e-01],
       [ 2.17068901e-01, -5.57936077e-01],
       [ 4.46538066e-01, -4.46126921e-01],
       [-3.76520311e-01, -8.54731945e-01],
       [ 9.28566985e-01, -2.79685321e+00],
       [ 2.12052130e-01, -1.64326002e+00],
       [ 2.81943563e+00, -1.45368676e-01],
       [-4.62940328e-01, -1.22356111e+00],
       [-1.14873835e+00, -1.95487362e+00],
       [ 3.17335534e+00,  2.93056829e-01],
       [ 2.62183971e+00, -1.60707392e+00],
       [-2.31105104e-01,  4.21899933e-01],
       [ 2.17332469e-01, -1.01486682e-01],
       [-8.96745989e-01, -1.61273816e-01],
       [ 1.66338964e+00, -3.28957830e-02],
       [ 3.99640621e+00,  3.39454129e+00],
       [ 5.64066166e+00,  1.21628484e+00],
       [ 5.57799915e-01,  1.26861624e-01],
       [-4.57856779e-02, -1.09273522e+00],
       [ 2.57489940e+00, -1.40230030e+00],
       [-1.21918775e-02,  1.29147714e+00],
       [-2.74383615e+00, -8.62573830e-01],
       [ 3.91191053e+00,  1.31948002e+00],
       [ 8.73179852e-02, -1.11242976e+00],
       [-2.20054456e+00, -1.20815707e+00],
       [-1.58233779e+00,  9.18079330e-02],
       [-2.38355150e+00, -8.34470636e-01],
       [ 5.49452416e-02, -2.21502808e+00],
       [ 1.97079928e+00,  1.00809263e+00],
       [ 1.31037255e+00,  2.23880046e-01],
       [-3.98499143e+00, -6.16605614e-01],
       [ 1.09099656e+00,  1.53979631e+00],
       [ 1.19287725e-01, -4.72687656e-01],
       [ 1.49249568e+00,  4.40674660e-01],
       [ 1.31982908e+00,  4.72695079e-01],
       [ 4.44789138e+00,  1.40907860e+00],
       [-1.23409599e+00,  1.45572599e+00],
       [-2.64026037e+00, -6.19356619e-01],
       [ 2.15796524e+00,  1.43985214e-01],
       [ 1.84073279e+00,  1.43395998e+00],
       [ 6.06253113e-01, -6.42431247e-01],
       [-1.22734384e+00, -1.72533409e+00],
       [-2.82398401e+00,  1.54302468e+00],
       [-2.35682173e+00,  1.30272353e+00],
       [ 2.84583603e+00,  1.32717608e-01],
       [ 3.18004639e+00, -1.91290875e-01]])
In [402]:
################################################# Linear Regression ###########################################################
In [405]:
## Import linear regression model library.
from sklearn.linear_model import LinearRegression
In [432]:
## Instantiate regression model and fit  a model.
linreg=LinearRegression()
linear_model=linreg.fit(train_data_final,y_train)
In [433]:
## Get the predictions on train and validation data.
pred_train = linear_model.predict(train_data_final)
pred_test = linear_model.predict(test_data_final)
In [ ]:
## Get predictions on test data.
test_pred = linear_model.predict(test_data_combine)
In [434]:
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 18987.647924827896
Test Error: 213230686032402.3
In [ ]:
#There is an indication given in the result that there might exist a strong multicollinearity in the data. 
#Lets use variance inflation factor (VIF) to understand if there exist a multicollinearity and remove those attributes.
In [519]:
## Import VIF library and get VIF vallues for  train data.
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif=pd.DataFrame()
vif['Vif']=[variance_inflation_factor(train_data_final.values,i) for i in range(train_data_final.shape[1])]
vif['Variables']=train_data_final.columns.values
C:\Users\nagar\Anaconda3\lib\site-packages\statsmodels\stats\outliers_influence.py:181: RuntimeWarning: divide by zero encountered in double_scalars
  vif = 1. / (1. - r_squared_i)
In [520]:
## Display VIF values for train data.
vif
Out[520]:
Vif Variables
0 3.427303 LotFrontage
1 3.999449 LotArea
2 19.790053 YearBuilt
3 4.192063 YearRemodAdd
4 3.712573 MasVnrArea
5 inf BsmtFinSF1
6 inf BsmtFinSF2
7 inf BsmtUnfSF
8 inf TotalBsmtSF
9 inf 1stFlrSF
10 inf 2ndFlrSF
11 inf LowQualFinSF
12 inf GrLivArea
13 4.432669 BedroomAbvGr
14 7.904046 TotRmsAbvGrd
15 6.608873 GarageYrBlt
16 9.654663 GarageCars
17 10.397869 GarageArea
18 1.720052 WoodDeckSF
19 2.017527 OpenPorchSF
20 2.086494 EnclosedPorch
21 1.291584 3SsnPorch
22 1.537126 ScreenPorch
23 4011.826053 PoolArea
24 15.981804 MiscVal
25 1.423591 MoSold
26 1.437997 YrSold
27 12.261775 MSSubClass_160
28 4.485264 MSSubClass_180
29 52.742532 MSSubClass_190
30 148.036504 MSSubClass_20
31 34.876756 MSSubClass_30
32 2.750924 MSSubClass_40
33 25.913360 MSSubClass_45
34 75.497906 MSSubClass_50
35 123.866874 MSSubClass_60
36 37.687851 MSSubClass_70
37 13.421388 MSSubClass_75
38 44.791574 MSSubClass_80
39 13.353473 MSSubClass_85
40 inf MSSubClass_90
41 17.310744 MSZoning_FV
42 5.422079 MSZoning_RH
43 53.826513 MSZoning_RL
44 36.078449 MSZoning_RM
45 3.215226 Street_Pave
46 3.554404 Alley_NAA
47 3.582523 Alley_Pave
48 1.524013 LotShape_IR2
49 1.880076 LotShape_IR3
50 1.903818 LotShape_Reg
51 2.887861 LandContour_HLS
52 2.980395 LandContour_Low
53 4.024932 LandContour_Lvl
54 1.926402 Utilities_NoSeWa
55 1.950570 LotConfig_CulDSac
56 1.559877 LotConfig_FR2
57 1.567949 LotConfig_FR3
58 2.052214 LotConfig_Inside
59 2.208504 LandSlope_Mod
60 3.973680 LandSlope_Sev
61 1.413362 Neighborhood_Blueste
62 5.419048 Neighborhood_BrDale
63 10.298785 Neighborhood_BrkSide
64 4.386085 Neighborhood_ClearCr
65 14.200442 Neighborhood_CollgCr
66 8.563209 Neighborhood_Crawfor
67 11.539715 Neighborhood_Edwards
68 8.353482 Neighborhood_Gilbert
69 10.310404 Neighborhood_IDOTRR
70 6.711799 Neighborhood_MeadowV
71 5.606135 Neighborhood_Mitchel
72 22.902358 Neighborhood_NAmes
73 4.257706 Neighborhood_NPkVill
74 9.953334 Neighborhood_NWAmes
75 5.938724 Neighborhood_NoRidge
76 7.855056 Neighborhood_NridgHt
77 20.591891 Neighborhood_OldTown
78 5.794227 Neighborhood_SWISU
79 9.321027 Neighborhood_Sawyer
80 8.322533 Neighborhood_SawyerW
81 13.880332 Neighborhood_Somerst
82 3.772372 Neighborhood_StoneBr
83 5.983023 Neighborhood_Timber
84 2.904420 Neighborhood_Veenker
85 4.768181 Condition1_Feedr
86 7.666565 Condition1_Norm
87 1.818198 Condition1_PosA
88 2.455862 Condition1_PosN
89 2.060942 Condition1_RRAe
90 2.922049 Condition1_RRAn
91 1.300616 Condition1_RRNe
92 2.093322 Condition1_RRNn
93 10.237745 Condition2_Feedr
94 22.657792 Condition2_Norm
95 5.059141 Condition2_PosA
96 5.122666 Condition2_PosN
97 inf Condition2_RRAe
98 2.964216 Condition2_RRAn
99 4.922677 Condition2_RRNn
100 40.937998 BldgType_2fmCon
101 inf BldgType_Duplex
102 22.616166 BldgType_Twnhs
103 47.396503 BldgType_TwnhsE
104 22.965994 HouseStyle_1.5Unf
105 56.428032 HouseStyle_1Story
106 6.019891 HouseStyle_2.5Fin
107 5.289820 HouseStyle_2.5Unf
108 37.302869 HouseStyle_2Story
109 10.332908 HouseStyle_SFoyer
110 24.742278 HouseStyle_SLvl
111 37.393888 OverallQual_10
112 8.742462 OverallQual_2
113 36.255497 OverallQual_3
114 174.456349 OverallQual_4
115 461.587063 OverallQual_5
116 436.827555 OverallQual_6
117 389.839838 OverallQual_7
118 245.516190 OverallQual_8
119 71.998566 OverallQual_9
120 inf OverallCond_2
121 inf OverallCond_3
122 inf OverallCond_4
123 inf OverallCond_5
124 inf OverallCond_6
125 inf OverallCond_7
126 inf OverallCond_8
127 inf OverallCond_9
128 262.399021 RoofStyle_Gable
129 18.323231 RoofStyle_Gambrel
130 240.734519 RoofStyle_Hip
131 11.340809 RoofStyle_Mansard
132 inf RoofStyle_Shed
133 2700.766719 RoofMatl_CompShg
134 174.363933 RoofMatl_Membran
135 173.802572 RoofMatl_Roll
136 852.637461 RoofMatl_Tar&Grv
137 517.627943 RoofMatl_WdShake
138 854.590541 RoofMatl_WdShngl
139 4.876335 Exterior1st_BrkComm
140 61.397688 Exterior1st_BrkFace
141 inf Exterior1st_CBlock
142 94.614071 Exterior1st_CemntBd
143 213.882917 Exterior1st_HdBoard
144 231.069581 Exterior1st_MetalSd
145 112.564784 Exterior1st_Plywood
146 4.062381 Exterior1st_Stone
147 34.602848 Exterior1st_Stucco
148 425.218706 Exterior1st_VinylSd
149 210.423161 Exterior1st_Wd Sdng
150 32.938163 Exterior1st_WdShing
151 4.615536 Exterior2nd_AsphShn
152 13.245295 Exterior2nd_Brk Cmn
153 30.295943 Exterior2nd_BrkFace
154 inf Exterior2nd_CBlock
155 91.359437 Exterior2nd_CmentBd
156 192.666290 Exterior2nd_HdBoard
157 10.701847 Exterior2nd_ImStucc
158 216.099506 Exterior2nd_MetalSd
159 2.861664 Exterior2nd_Other
160 134.825448 Exterior2nd_Plywood
161 4.997426 Exterior2nd_Stone
162 35.572751 Exterior2nd_Stucco
163 399.756520 Exterior2nd_VinylSd
164 194.924924 Exterior2nd_Wd Sdng
165 40.216584 Exterior2nd_Wd Shng
166 29.147382 MasVnrType_BrkFace
167 33.423333 MasVnrType_None
168 11.307175 MasVnrType_Stone
169 1.685112 MasVnrType_nan
170 8.052741 ExterQual_Fa
171 19.876674 ExterQual_Gd
172 25.635366 ExterQual_TA
173 26.442209 ExterCond_Fa
174 125.104408 ExterCond_Gd
175 4.197280 ExterCond_Po
176 148.024295 ExterCond_TA
177 7.677131 Foundation_CBlock
178 9.153874 Foundation_PConc
179 5.887688 Foundation_Slab
180 1.746870 Foundation_Stone
181 1.426486 Foundation_Wood
182 3.673268 BsmtQual_Fa
183 8.423189 BsmtQual_Gd
184 inf BsmtQual_NB
185 13.395188 BsmtQual_TA
186 3.721399 BsmtCond_Gd
187 inf BsmtCond_NB
188 inf BsmtCond_Po
189 5.514594 BsmtCond_TA
190 2.437357 BsmtExposure_Gd
191 2.133235 BsmtExposure_Mn
192 28.763158 BsmtExposure_NB
193 3.543014 BsmtExposure_No
194 2.271547 BsmtFinType1_BLQ
195 3.914006 BsmtFinType1_GLQ
196 2.075776 BsmtFinType1_LwQ
197 inf BsmtFinType1_NB
198 2.289346 BsmtFinType1_Rec
199 5.219077 BsmtFinType1_Unf
200 3.992223 BsmtFinType2_BLQ
201 2.786058 BsmtFinType2_GLQ
202 5.027166 BsmtFinType2_LwQ
203 inf BsmtFinType2_NB
204 6.671440 BsmtFinType2_Rec
205 22.516866 BsmtFinType2_Unf
206 2.367540 Heating_GasW
207 6.569004 Heating_Grav
208 1.818154 Heating_OthW
209 2.167037 Heating_Wall
210 2.415494 HeatingQC_Fa
211 1.863640 HeatingQC_Gd
212 1.634062 HeatingQC_Po
213 2.679859 HeatingQC_TA
214 3.092275 CentralAir_Y
215 2.479705 Electrical_FuseF
216 2.340425 Electrical_FuseP
217 inf Electrical_Mix
218 2.447257 Electrical_SBrkr
219 1.198662 Electrical_nan
220 3.885402 KitchenQual_Fa
221 11.535085 KitchenQual_Gd
222 14.338955 KitchenQual_TA
223 2.692766 Functional_Maj2
224 5.099321 Functional_Min1
225 6.448350 Functional_Min2
226 3.590562 Functional_Mod
227 13.171473 Functional_Typ
228 3.554932 FireplaceQu_Fa
229 18.241309 FireplaceQu_Gd
230 24.980039 FireplaceQu_NF
231 2.401982 FireplaceQu_Po
232 16.986905 FireplaceQu_TA
233 128.433846 GarageType_Attchd
234 8.687245 GarageType_Basment
235 29.559992 GarageType_BuiltIn
236 6.091227 GarageType_CarPort
237 104.829853 GarageType_Detchd
238 inf GarageType_NG
239 inf GarageFinish_NG
240 2.444350 GarageFinish_RFn
241 4.572504 GarageFinish_Unf
242 inf GarageQual_Fa
243 inf GarageQual_Gd
244 inf GarageQual_NG
245 inf GarageQual_Po
246 inf GarageQual_TA
247 inf GarageCond_Fa
248 inf GarageCond_Gd
249 inf GarageCond_NG
250 inf GarageCond_Po
251 inf GarageCond_TA
252 2.102860 PavedDrive_P
253 2.896349 PavedDrive_Y
254 inf PoolQC_Fa
255 204.949954 PoolQC_Gd
256 3606.247680 PoolQC_NP
257 2.836423 Fence_GdWo
258 4.707714 Fence_MnPrv
259 1.340711 Fence_MnWw
260 6.083139 Fence_NF
261 inf MiscFeature_NE
262 inf MiscFeature_Shed
263 inf MiscFeature_TenC
264 1.490160 SaleType_CWD
265 1.310236 SaleType_Con
266 2.142271 SaleType_ConLD
267 1.500495 SaleType_ConLI
268 2.429533 SaleType_ConLw
269 43.538066 SaleType_New
270 1.414532 SaleType_Oth
271 5.973698 SaleType_WD
272 2.085595 SaleCondition_AdjLand
273 2.235964 SaleCondition_Alloca
274 1.549010 SaleCondition_Family
275 3.882058 SaleCondition_Normal
276 40.600436 SaleCondition_Partial
In [437]:
############@@@@@@@@@########## Perform Grid Search,Ridge,Lasso ###############################################################
In [439]:
## Import Ridge,lasso model libraires.
from sklearn.linear_model import Ridge, Lasso
In [ ]:
## Ridge
In [440]:
## Import Grid search library.
from sklearn.model_selection import GridSearchCV 
## Ridge regression is parametric and takes a parameter alpha. The value of alpha determines the reduction in magnitude of coefficients.
## But we also need to check which value of alpha gives best predictions on test data. For this we experiment with several values of alpha and pick the best
## We do this by performing grid search over several values of alpha. 
alphas = np.array([1,0.1,0.01,0.001,0.0001,0,1.5,2]) ## Pick the best of these values.
## Create and fit a ridge regression model, testing each alpha.
model_ridge = Ridge()
grid = GridSearchCV(estimator=model_ridge, param_grid=dict(alpha=alphas),cv=10) ## Here the argument cv=10 implies compute error on 10 chucks of data and report average value.
grid.fit(train_data_final,y_train)
print(grid)
GridSearchCV(cv=10, error_score=nan,
             estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True,
                             max_iter=None, normalize=False, random_state=None,
                             solver='auto', tol=0.001),
             iid='deprecated', n_jobs=None,
             param_grid={'alpha': array([1.0e+00, 1.0e-01, 1.0e-02, 1.0e-03, 1.0e-04, 0.0e+00, 1.5e+00,
       2.0e+00])},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)
In [442]:
## Print best params.
print(grid.best_score_)
print(grid.best_estimator_.alpha)
0.7608287095091045
2.0
In [444]:
## Instantiate Ridge and fit it.
Ridge_model= Ridge(alpha=2,normalize=False)
Ridge_model.fit(train_data_final,y_train) ## Applying it on the train data, to obtain the coefficients.
Out[444]:
Ridge(alpha=2, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
      random_state=None, solver='auto', tol=0.001)
In [445]:
## Get the predictions on train and validation data.
pred_train = Ridge_model.predict(train_data_final)
pred_test = Ridge_model.predict(test_data_final)
In [ ]:
## Get predictions on test data.
test_pred = Ridge_model.predict(test_data_combine)
In [446]:
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 23581.943193260486
Test Error: 38789.74028716411
In [447]:
## Lasso
In [448]:
## Get best parameter vlaues by doing grid search.
model_lasso = Lasso()
grid = GridSearchCV(estimator=model_lasso, param_grid=dict(alpha=alphas),cv=10) #Here the argument cv=10 implies compute error on 10 chucks of data and report average value
grid.fit(train_data_final,y_train)
print(grid)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 132283552276.12665, tolerance: 586416997.6105675
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 151663669283.90045, tolerance: 600453299.974236
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 158740041164.45273, tolerance: 620994499.8141209
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 123344376302.26239, tolerance: 615809968.3676016
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 140781385865.65784, tolerance: 568848936.1139773
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 156195508585.75662, tolerance: 595517884.9173822
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 158502548646.73212, tolerance: 621049274.6193085
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 162310583572.46268, tolerance: 599486045.6069565
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 13517905993.891144, tolerance: 600818404.6004866
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 165010261997.6953, tolerance: 612724923.8869787
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 135500461747.66463, tolerance: 586416997.6105675
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 155076158397.17624, tolerance: 600453299.974236
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 168542377853.999, tolerance: 620994499.8141209
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 125737659894.84619, tolerance: 615809968.3676016
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 147776486107.88174, tolerance: 568848936.1139773
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159157662341.10913, tolerance: 595517884.9173822
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 161191554269.7159, tolerance: 621049274.6193085
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 165532344219.9775, tolerance: 599486045.6069565
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 153640045858.81238, tolerance: 600818404.6004866
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 168866180257.60345, tolerance: 612724923.8869787
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 136191512314.9615, tolerance: 586416997.6105675
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 155768556412.25137, tolerance: 600453299.974236
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169282554570.844, tolerance: 620994499.8141209
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 126161299358.95221, tolerance: 615809968.3676016
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 157142358721.08127, tolerance: 568848936.1139773
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159748079904.3531, tolerance: 595517884.9173822
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 161735022525.51306, tolerance: 621049274.6193085
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 166155175440.84982, tolerance: 599486045.6069565
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 167401858539.57175, tolerance: 600818404.6004866
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169619094896.12466, tolerance: 612724923.8869787
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 136266512425.06892, tolerance: 586416997.6105675
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 155851191765.648, tolerance: 600453299.974236
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169358827229.7449, tolerance: 620994499.8141209
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 126211912716.69667, tolerance: 615809968.3676016
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159633335235.47568, tolerance: 568848936.1139773
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159817012919.41357, tolerance: 595517884.9173822
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 161798971794.1304, tolerance: 621049274.6193085
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 166227498302.9395, tolerance: 599486045.6069565
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 168726366547.3222, tolerance: 600818404.6004866
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169707860342.087, tolerance: 612724923.8869787
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 136274057865.86304, tolerance: 586416997.6105675
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 155859610588.30753, tolerance: 600453299.974236
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169366475270.95752, tolerance: 620994499.8141209
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 126217069343.087, tolerance: 615809968.3676016
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159878364429.8121, tolerance: 568848936.1139773
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159824010387.80765, tolerance: 595517884.9173822
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 161805476601.50433, tolerance: 621049274.6193085
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 166234845671.58163, tolerance: 599486045.6069565
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 168858320100.98807, tolerance: 600818404.6004866
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169716893665.36902, tolerance: 612724923.8869787
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 136274896570.39502, tolerance: 586416997.6105675
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 155860547770.64722, tolerance: 600453299.974236
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169367325036.09665, tolerance: 620994499.8141209
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 126217643387.33908, tolerance: 615809968.3676016
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159905544326.53275, tolerance: 568848936.1139773
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159824788664.6952, tolerance: 595517884.9173822
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 161806200612.03137, tolerance: 621049274.6193085
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 166235663357.0879, tolerance: 599486045.6069565
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 168872976003.698, tolerance: 600818404.6004866
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
  estimator.fit(X_train, y_train, **fit_params)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169717899135.38824, tolerance: 612724923.8869787
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 131917434426.06377, tolerance: 586416997.6105675
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 151619836495.39517, tolerance: 600453299.974236
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 149420849916.41656, tolerance: 620994499.8141209
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 123406506682.24747, tolerance: 615809968.3676016
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 143205247619.85175, tolerance: 568848936.1139773
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 156085028299.17822, tolerance: 595517884.9173822
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 158535821556.43405, tolerance: 621049274.6193085
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 162112141735.13416, tolerance: 599486045.6069565
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2248601315.1430054, tolerance: 600818404.6004866
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 164730061152.02676, tolerance: 612724923.8869787
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 132097157478.19542, tolerance: 586416997.6105675
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 152040330790.3219, tolerance: 600453299.974236
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 135269185357.28479, tolerance: 620994499.8141209
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 123909094784.74976, tolerance: 615809968.3676016
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 145432817081.4537, tolerance: 568848936.1139773
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 156409232806.73566, tolerance: 595517884.9173822
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 158987043816.8599, tolerance: 621049274.6193085
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 162330488914.3558, tolerance: 599486045.6069565
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1576156088.5111694, tolerance: 600818404.6004866
  positive)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 164960209951.37195, tolerance: 612724923.8869787
  positive)
GridSearchCV(cv=10, error_score=nan,
             estimator=Lasso(alpha=1.0, copy_X=True, fit_intercept=True,
                             max_iter=1000, normalize=False, positive=False,
                             precompute=False, random_state=None,
                             selection='cyclic', tol=0.0001, warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid={'alpha': array([1.0e+00, 1.0e-01, 1.0e-02, 1.0e-03, 1.0e-04, 0.0e+00, 1.5e+00,
       2.0e+00])},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 181238574198.1571, tolerance: 669162392.3363616
  positive)
In [449]:
## Display  best parameters.
print(grid.best_score_)
print(grid.best_estimator_.alpha)
0.6587272788673987
1.0
In [450]:
## Instantiate Lasso and fit it.
Lasso_model= Lasso(alpha=1.0,normalize=False)
Lasso_model.fit(train_data_final,y_train) ## Applying it on the train data, to obtain the coefficients.
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 181238574198.1571, tolerance: 669162392.3363616
  positive)
Out[450]:
Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False)
In [451]:
## Get the predictions on train and validation data.
pred_train = Lasso_model.predict(train_data_final)
pred_test = Lasso_model.predict(test_data_final)
In [ ]:
## Get predictions on test data.
test_pred = Lasso_model.predict(test_data_combine)
In [452]:
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 19105.464258594904
Test Error: 260630.3318911821